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Abstract 



We demonstrate the separation of the complexity class NP from its sub- 
class P. Throughout our proof, we observe that the ability to compute a prop- 
erty on structures in polynomial time is intimately related to an atypical prop- 
erty of the space of solutions — namely, the space is parametrizable with only 
cpoiy(iogn) ^ ^ I parameters instead of the typical parameters required for a 
joint distribution of n covariates. 

This type of exponentially smaller parametrization arises as a result of severe 
limitations placed on the interaction between the variates. In particular, it may 
arise from range limited interactions where variates interact at short ranges, 
and chain together such interactions to create long range interactions. Such 
long range interactions then would be characterized by the statistical notions 
of conditional independence and sufficient statistics. The presence of condi- 
tional independencies manifests in the form of economical parametrizations of 
the joint distribution of covariates. Likewise, such economical parametrizations 
can arise from interactions which take only many values. In both cases, 

the result on the joint distribution is the same — it is parametrizable with only 
^poiy(iogn) independent parameters. In order to apply this analysis to the space 
of solutions of random constraint satisfaction problems, we utilize and expand 
upon ideas from several fields spanning logic, statistics, graphical models, ran- 
dom ensembles, and statistical physics. 

We begin by introducing the requisite framework of graphical models for a 
set of interacting variables. We focus on the correspondence between Markov 
and Gibbs properties for directed and undirected models as reflected in the fac- 
torization of their joint distribution, and the number of independent parameters 
required to specify the distribution. 

Next, we build the central contribution of this work. We show that there are 
fundamental conceptual relationships between polynomial time computation. 



which is completely captured by the logic FO(LFP) on classes of successor struc- 
tures, and poly(logn)-parametrization. In particular, monadic LFP is a range 
limited interaction model that possesses certain directed Markov properties 
that may be stated in terms of conditional independence and sufficient statis- 
tics. In order to demonstrate these relationships, we view the LFP computation 
as "factoring through" several stages of first order computations, and then uti- 
lize the limitations of first order logic. Specifically, we exploit the limitation 
that first order logic can only express properties in terms of a bounded num- 
ber of local neighborhoods of the underlying structure. Then we relate com- 
plex fixed points to value limited interactions, which again result in poly (log n)- 
parametrization. 

Next we introduce ideas from the IRSB replica symmetry breaking ansatz of 
statistical physics. We recollect the description of the clustered phase for ran- 
dom k-SAT that arises when the clause density is sufficiently high and A; > 9. 
In this phase, known as the dlRSB phase, an arbitrarily large fraction of all vari- 
ables in cores freeze within exponentially many clusters in the thermodynamic 
limit, as the clause density is increased towards the SAT-unSAT threshold. The 
Hamming distance between a solution that lies in one cluster and that in an- 
other is 0{n). Note that the onset of this phase is rigorously proven only for 
k > 9, and it is here that we will demonstrate our separation. 

Next, we encode /c-SAT formulae as structures on which FO(LFP) captures 
polynomial time. By asking FO(LFP) to extend partial assignments on ensem- 
bles of random A:-S AT, we build distributions of solutions. We then construct a 
dynamic graphical model on a product space that captures all the information 
flows through the various stages of a LFP computation on ensembles of /c-SAT 
structures. Distributions computed by LFP must satisfy this model. This model 
is directed, which allows us to compute factorizations locally and parameterize 
using Gibbs potentials on cliques. We then use results from ensembles of factor 
graphs of random A;-SAT to bound the various information flows in this di- 
rected graphical model. We parametrize the resulting distributions in a manner 
that demonstrates that irreducible interactions between covariates — namely. 



those that may not be factored any further through conditional independencies 
— cannot grow faster than poly(logn) in the range limited monadic LFP com- 
puted distributions. For value limited complex LFP, we show how to obtain a 
parametrization of the solution space by merging potentials with scope 0{n). 
This allows us to analyze the behavior of the entire class of polynomial time 
algorithms on ensembles simultaneously. 

Using the aforementioned limitations of LFP, we demonstrate that a pur- 
ported polynomial time solution to /c-SAT would result in solution space that 
is a mixture of distributions each having an exponentially smaller parametriza- 
tion than is consistent with the highly constrained dlRSB phases of /c-SAT. We 
show that this would contradict the behavior exhibited by the solution space in 
the dlRSB phase. This corresponds to the intuitive picture provided by physics 
about the emergence of extensive (meaning 0{n)) long-range correlations be- 
tween variables in this phase and also explains the empirical observation that 
all known polynomial time algorithms break down in this phase. 

Our work shows that every polynomial time algorithm must fail to produce 
solutions to large enough problem instances of fc-S AT in the dlRSB phase. This 
shows that polynomial time algorithms are not capable of solving NP-complete 
problems in their hard phases, and demonstrates the separation of P from NP. 
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1. Introduction 



The P = NP question is generally considered one of the most important and 
far reaching questions in contemporary mathematics and computer science. 

The origin of the question seems to date back to a letter from Godel to Von 
Neumann in 1956 [Sip92]. Formal definitions of the class NP awaited work by 
Edmonds [Edm65], Cook [Coo71], and Levin [Lev73]. The Cook-Levin theorem 
showed the existence of complete problems for this class, and demonstrated 
that SAT- the problem of determining whether a set of clauses of Boolean lit- 
erals has a satisfying assignment - was one such problem. Later, Karp [Kar72] 
showed that twenty-one well known combinatorial problems, which include 
Travelling Salesman, Clique, and Hamiltonian Circuit, were also 
NP -complete. In subsequent years, many problems central to diverse areas of 
application were shown to be NP-complete (see [GJ79] for a list). If P 7^ NP, 
we could never solve these problems efficiently If, on the other hand P = NP, 
the consequences would be even more stunning, since every one of these prob- 
lems would have a polynomial time solution. The implications of this on ap- 
plications such as cryptography, and on the general philosophical question of 
whether human creativity can be automated, would be profound. 

The P = NP question is also singular in the number of approaches that re- 
searchers have brought to bear upon it over the years. From the initial question 
in logic, the focus moved to complexity theory where early work used diago- 
nalization and relativization techniques. However, [BGS75] showed that these 
methods were perhaps inadequate to resolve P = NP by demonstrating rela- 
tivized worlds in which P = NP and others in which P ^ NP (both relations 
for the appropriately relativized classes). This shifted the focus to methods us- 
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ing circuit complexity and for a while this approach was deemed the one most 
likely to resolve the question. Once again, a negative result in [RR97] showed 
that a class of techniques known as "Natural Proofs" that subsumed the above 
could not separate the classes NP and P, provided one-way functions exist. 

Owing to the difficulty of resolving the question, and also to the negative 
results mentioned above, there has been speculation that resolving the P = 
NP question might be outside the domain of mathematical techniques. More 
precisely, the question might be independent of standard axioms of set theory. 
The first such results in [HH76] show that some relativized versions of the P = 
NP question are independent of reasonable formalizations of set theory. 

The influence of the P = NP question is felt in other areas of mathematics. 
We mention one of these, since it is central to our work. This is the area of de- 
scriptive complexity theory — the branch of finite model theory that studies the 
expressive power of various logics viewed through the lens of complexity the- 
ory. This field began with the result [Fag74] that showed that NP corresponds 
to queries that are expressible in second order existential logic over finite struc- 
tures. Later, characterizations of the classes P [Imm86], [Var82] and PSPACE 
over ordered structures were also obtained. 

There are several introductions to the P = NP question and the enormous 
amount of research that it has produced. The reader is referred to [Coo06] for an 
introduction which also serves as the official problem description for the Clay 
Millenium Prize. An older excellent review is [Sip92]. See [WigOZ] for a more 
recent introduction. Most books on theoretical computer science in general, 
and complexity theory in particular, also contain accounts of the problem and 
attempts made to resolve it. See the books [Sip97] and [BDG95] for standard 
references. 

Preliminaries and Notation 

Treatments of standard notions from complexity theory, such as definitions of 
the complexity classes P, NP, PSPACE, and notions of reductions and com- 
pleteness for complexity classes, etc. may be found in [Sip97, BDG95]. 
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Our work will span various developments in three broad areas. While we 
have endeavored to be relatively complete in our treatment, we feel it would 
be helpful to provide standard textual references for these areas, in the order 
in which they appear in the work. Additional references to results will be pro- 
vided within the chapters. 

Standard references for graphical models include [Lau96] and the more re- 
cent [KF09]. For an engaging introduction, please see [Bis06, Ch. 8]. For an 
early treatment in statistical mechanics of Markov random fields and Gibbs dis- 
tributions, see [KS80]. 

Preliminaries from logic, such as notions of structure, vocabulary, first order 
language, models, etc., may be obtained from any standard text on logic such 
as [Hod93]. In particular, we refer to [EF06, Lib04] for excellent treatments of 
finite model theory and [Imm99] for descriptive complexity. 

For a treatment of the statistical physics approach to random CSPs, we rec- 
ommend [MM09]. An earlier text is [MPV87]. 

1.1 Synopsis of Proof 

This proof requires a convergence of ideas and an interplay of principles that 
span several areas within mathematics and physics. This represents the major- 
ity of the effort that went into constructing the proof. Given this, we felt that 
it would be beneficial to explain the various stages of the proof, and highlight 
their interplay. The technical details of each stage are described in subsequent 
chapters. 

Consider a system of n interacting variables such as is ubiquitous in mathe- 
matical sciences. For example, these may be the variables in a /c-SAT instance 
that interact with each other through the clauses present in the k-SAT formula, 
or n Ising spins that interact with each other in a ferromagnet. For ease of pre- 
sentation, we will assume our variables are binary. Through their interaction, 
variables exert an influence on each other, and affect the values each other may 
take. The proof centers on the study of logical and algorithmic constructs where 
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such complex interactions have "simple" descriptions. 

What constitutes a simple description of the interaction of n variables? The 
number of independent parameters required to specify the joint distribution is a 
measure of the complexity of interactions between the covariates. There are 
two components to this. The first measures correlations, and the second mea- 
sures "ampleness" under those correlations. This is best explained with two 
examples. Consider first the uniform distribution over all binary pairs 

{(0,0), (0,1), (1,0), (1,1)} 

There is no correlation between the two variables in this distribution. They are 
independent. Consider next the distribution over 5 covariates which is uni- 
formly supported only on 

(0,0,0,0,0) and (1,1,1,1,1). 

In this distribution, the covariates are tightly correlated, but the distribution is 
not "ample". A distribution over n covariates is defined to be ample when it is 
supported on c", c > 1 points. 

Though initially these two distributions appear quite different, there is a 
commonality. Both can be specified with just two parameters. In the first exam- 
ple, the two parameters are the probability of the first variate and the probability 
of the second variate taking the value 1. With this much information, we can 
specify the joint distribution since the variates are independent. 

In the second example, we again need two parameters to specify the distri- 
bution — namely, the two points on which it is supported. 

Though both distributions have simple descriptions, the reasons are very 
different. We will study distributions on n covariates that require only 2P°^y(^°s") 
parameters to specify. We will call such distributions poly (log n)-parametrizable. 
We will see that such distributions are at the heart of polynomial time com- 
putability. Conversely, in hard phases of constraint satisfaction problems such 
as A:-SAT, the space of solutions is both correlated and ample. This causes all 
polynomial time algorithms to fail on them. 
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A distribution is simple to describe if there is either independence between 
the variates (as was the case in our jfirst example) or limited support of the dis- 
tribution (as was the case in our second example). We call the first case a range 
limited interaction because variates interact with a limited range of other vari- 
ates. The second case is called value limited since the number of joint values the 
variates can take is limited. The common feature underlying both cases is that 
the distribution has a very economical parametrization as compared to "true" 
joint distributions (more precisely, statistically typical joint distributions) on n 
covariates, which require 0(2") parameters to specify. Thus, we wish to study 
such distributions, and will consider both the cases of range and value limited 
interactions. 

At this point, we visit the topic of graphical interaction models and condi- 
tional inde:pendence which is a manifestation of range limited interactions. While 
complete independence between variates in a complex system is rare, condi- 
tional independence between blocks of variables is fairly frequent. We see that 
factorization into conditionally independent pieces manifests in terms of eco- 
nomical parametrizations of the joint distribution. Graphical models offer us a 
way to measure the size of these interactions. 

The factorization of interactions can be represented by a corresponding fac- 
torization of the joint distribution of the variables over the space of configura- 
tions of the n variables subject to the constraints of the problem. It has been real- 
ized in the statistics and physics communities for long that certain multivariate 
distributions decompose into the product of a few types of factors, with each 
factor itself having only a few variables. Such a factorization of joint distribu- 
tions into simpler factors can often be represented by graphical models whose 
vertices index the variables. A factorization of the joint distribution according to 
the graph implies that the interactions between variables can be factored into a 
sequence of "local interactions" between vertices that lie within neighborhoods 
of each other. 

Consider the case of an undirected graphical model. The factoring of inter- 
actions may be stated in terms of either a Markov property, or a Gibbs property 
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with respect to the graph. Specifically, the local Markov property of such mod- 
els states that the distribution of a variable is only dependent directly on that 
of its neighbors in an appropriate neighborhood system. Of course, two vari- 
ables arbitrarily far apart can influence each other, hut only through a sequence 
of successive local interactions. The global Markov property for such models states 
that when two sets of vertices are separated by a third, this induces a condi- 
tional independence on variables corresponding to these sets of vertices, given 
those corresponding to the third set. On the other hand, the Gihhs property of a 
distribution with respect to a graph asserts that the distribution factors into a 
product of potential functions over the maximal cliques of the graph. Each po- 
tential captures the interaction between the set of variables that form the clique. 
The Hammersley-Clifford theorem states that a positive distribution having the 
Markov property with respect to a graph must have the Gibbs property with 
respect to the same graph. 

The condition of positivity is essential in the Hammersley-Clifford theorem 
for undirected graphs. However, it is not required when the distribution satis- 
fies certain directed models. In that case, the Markov property with respect to 
the directed graph implies that the distribution factorizes into local conditional 
probability distributions (CPDs). Furthermore, if the model is a directed acyclic 
graph (DAG), we can obtain the Gibbs property with respect to an undirected 
graph constructed from the DAG by a process known as moralization. We will 
return to the directed case shortly. 

Chapter 2 develops the principles underlying the framework of graphical 
models. We will not use any of these models in particular, but construct another 
directed model on a larger product space that utilizes these principles and tailors 
them to the case of least fixed point logic, which we turn to next. 

At this point, we change to the setting of finite model theory. Finite model 
theory is a branch of mathematical logic that has provided machine indepen- 
dent characterizations of various important complexity classes including P, 
NP, and PSPACE. In particular, the class of polynomial time computable 
queries on successor structures has a precise description — it is the class of queries 
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expressible in the logic FO(LFP) which extends first order logic with the abil- 
ity to compute least fixed points of positive first order formulae. Least fixed 
point constructions iterate an underlying positive first order formula, thereby 
building up a relation in stages. We take a geometric picture of a monadic LFP 
computation. Initially the relation to be built is empty. At the first stage, certain 
elements, whose types satisfy the first order formula, enter the relation. This 
changes the neighborhoods of these elements, and therefore in the next stage, 
other elements (whose neighborhoods have been thus changed in the previous 
stages) become eligible for entering the relation. The positivity of the formula 
implies that once an element is in the relation, it cannot be removed, and so 
the iterations reach a fixed point in a pol5momial number of steps. Importantly 
from our point of view, the positivity and the stage- wise nature of LFP means 
that the computation has a directed representation on a graphical model that we 
will construct. Recall at this stage that distributions over directed models enjoy 
factorization even when they are not defined over the entire space of configura- 
tions. 

We may interpret this as follows: monadic LFP relies on the assumption that 
variables that are highly entangled with each other due to constraints can be 
disentangled in a way that they now interact with each other through condi- 
tional independencies induced by a certain directed graphical model construc- 
tion. Of course, an element does influence others arbitrarily far away, but only 
through a sequence of such successive local and bounded interactions. The reason LFP 
computations terminate in polynomial time is analogous to the notions of con- 
ditional independence that underlie efficient algorithms on graphical models 
having sufficient factorization into local interactions. 

In order to apply this picture in full generality to all LFP computations, we 
use the simultaneous induction lemma to push all simultaneous inductions into 
nested ones, and then employ the transitivity theorem to encode nested fixed 
points as sections of a single relation of higher arity. We then see that this is 
the case of a value limited interaction between 0{n) variates. Namely, although 
n variates interact with each other, they do not take c" joint values. Building 
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the machinery that can precisely map all these cases to the picture of either 
factorization into range limited or value limited interactions is the subject of 
Chapter 5. 

The preceding insights now direct us to the setting necessary in order to 
separate P from NP. We need a regime of NP-complete problems where inter- 
actions between variables have the following two properties. 

1. They are so "dense" that they cannot be factored through the bottleneck of 
the local and bounded properties of first order logic that limit each stage 
of LFP computation. 

2. The distribution is ample. Namely, it takes c" joint values. 

Intuitively, this should happen when each variable has to simultaneously sat- 
isfy constraints involving an extensive {0{n)) fraction of the variables in the 
problem, and blocks of n variables are instantiated distinct ways under these 
strong correlations. Namely, we have ample, and highly correlated distribu- 
tions having no factorization into conditionally independent pieces (remember 
the value limited case is already ruled out since the distribution is ample). 

In search of regimes where such situations arise, we turn to the study of 
ensemble random /c-SAT where the properties of the ensemble are studied as a 
function of the clause density parameter. We will now add ideas from this field 
which lies on the intersection of statistical mechanics and computer science to 
the set of ideas in the proof. 

In the past two decades, the phase changes in the solution geometry of ran- 
dom /c-SAT ensembles as the clause density increases, have gathered much re- 
search attention. The IRSB ansatz of statistical mechanics says that the space of 
solutions of random A;-SAT shatters into exponentially many clusters of solu- 
tions when the clause density is sufficiently high. This phase is called IdRSB (1- 
Step Dynamic Replica Symmetry Breaking) and was conjectured by physicists 
as part of the IRSB ansatz. It has since been rigorously proved for high values 
of k. It demonstrates the properties of high correlation between large sets of 
variables that we will need. Specifically, the emergence of cores that are sets of 
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C clauses all of whose variables lie in a set of size C (this actually forces C to be 
0{n)). As the clause density is increased, the variables in these cores "freeze." 
Namely, they take the same value throughout the cluster. Changing the value of 
a variable within a cluster necessitates changing 0{n) other variables in order 
to arrive at another satisfying solution, which would be in a different cluster. 
Furthermore, as the clause density is increased towards the SAT-unSAT thresh- 
old, each cluster collapses steadily towards a single solution, that is maximally 
far apart from every other cluster. Physicists think of this as an "energy gap" 
between the clusters. Such stages are precisely the ones that we need since they 
possess the following two properties. 

1. Due to strong 0{n) correlations that cannot be factored through condi- 
tional independencies, they resist attack by local and bounded first order 
stages of a monadic LFP computation. 

2. Due to their ampleness, which arises from their instantiations in expo- 
nentially many clusters, they resist attack by complex fixed points that 
produce value limited distributions. 

Finally, as the clause density increases above the SAT-unSAT threshold, the so- 
lution space vanishes, and the underlying instance of SAT is no longer satisfi- 
able. 

We should stress that the picture described above is known to hold in the 
case of random A;-SAT only for A; > 9. For lower values of k, such as A; = 3, 
there is empirical evidence that this picture does not hold. In other words, the 

"true" dlRSB phase arises in random fc-SAT for /c > 9 as the clause density rises 
above {2^ /k) In k. Since we need all the known properties of the dlRSB phase, 
we will work in this regime. Therefore, our proof does not say anything about 
the efficacy of various algorithms for 3-S AT, for instance. We specifically prove 
that the dlRSB phase is out of reach for polynomial time algorithms, and this 
phase is only reached at /c > 9. We reproduce the rigorously proved picture of 
the IRSB ansatz that we will need in Chapter 6. 

In Chapter 7, we make a brief excursion into the random graph theory of 
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the factor graph ensembles underlying random k-SAT. From here, we obtain 
results that asymptotically almost surely upper bound the size of the largest 
cliques in the neighborhood systems on the Gaifman graphs that we study 
later when we build models for the range limited interactions that occur during 
monadic LFP. These provide us with bounds on the largest irreducible interac- 
tions between variables during the various stages of a LFP computation. 

Finally in Chapter 8, we pull all the threads and machinery together. First, 
we encode /c-SAT instances as queries on structures over a certain vocabulary 
in a way that LFP captures all polynomial time computable queries on them. 
We then set up the framework whereby we can generate distributions of solu- 
tions to each instance by asking a purported LFP algorithm for /c-SAT to extend 
partial assignments on variables to full satisfying assignments. 

Next, we embed the space of covariates into a larger product space which al- 
lows us to "disentangle" the flow of information during a LFP computation. 
This allows us to study the computations performed by the LFP with various 
initial values under a directed graphical model. This model is only polynomi- 
ally larger than the structure itself. We call this the Element-Neighborhood-Stage 
Product, or ENS P model. The distribution of solutions generated by LFP then is 
a mixture of distributions each of whom factors according to an ENS P. 

At this point, we wish to measure the growth of independent parameters of 
distributions of solutions whose embeddings into the larger product space fac- 
tor over the ENS P. In order to do so, we utilize the following properties for 
range limited models. 

1. The directed nature of the model that comes from properties of LFP. 

2. The properties of neighborhoods that are obtained by studies on random 
graph ensembles, specifically that neighborhoods that occur during the 
LFP computation are of size poly (log n) asymptotically almost surely in 
the n ^ oo limit. 

3. The locality and boundedness properties of FO that put constraints upon 
each individual stage of the LFP computation. 



12 



1. INTRODUCTION 



13 



4. Simple properties of LFP, such as the closure ordinal being a polynomial 
in the structure size. 

The crucial property that allows us to analyze mixtures of range limited dis- 
tributions that factor according to some ENSP is that we can parametrize the 
distribution using potentials on cliques of its moralized graph that are of size at 
most poly(logn). This means that when the mixture is exponentially numerous, 
we will see features that reflect the poly (log n) factor size of the conditionally 
independent parametrization. 

Next, we come to value limited models. Here, interactions are of size 0{n), 
but they are limited to only cP°^y(^°§") values, thereby giving us poly (log n) -parametrization. 
We show how to deal with mixtures of value limited models. We build a tech- 
nique that merges various 0{n) potentials that are poly (log n)-parametrizable 
into a single potential that is also poly(log n)-parametrizable, and covers the en- 
tire graphical model (that has poly(n) variables.) 

Now we close the loop and show that a distribution of solutions for /c-SAT 
constructed by any purported LFP algorithm (monadic or complex) would not 
have enough parameters to describe the known picture of /c-SAT in the dlRSB 
phase for A; > 9 — namely, the presence of extensive frozen variables in ex- 
ponentially many clusters with Hamming distance between the clusters be- 
ing 0{n). In particular, in exponentially numerous mixtures of range limited 
models, we would have conditionally independent variation between blocks of 
poly (log n) variables, causing the Hamming distance between solutions to be of 
this order as well. In other words, solutions for /c-SAT that are constructed us- 
ing range limited LFP will display aggregate behavior that reflects that they are 
constructed out of "building blocks" of size poly(log n). This behavior will man- 
ifest when exponentially many solutions are generated by the LFP construction. 
The case of value limited LFP also leads to a contradiction since it would be 
unable to explain the exponentially many cluster instantiations of cores that are 
present in the dlRSB phase. 

This shows that LFP cannot express the satisfiability query in the dlRSB 
phase for high enough k, and separates P from NP. This also explains the 
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empirical observation that all known polynomial time algorithms fail in the 
dlRSB phase for high values of k, and also establishes on rigorous principles 
the physics intuition about the onset of extensive long range correlations in the 
dlRSB phase that causes all known polynomial time algorithms to fail. It also 
completes this picture, since it says that extensive 0{n) correlations that (a) can- 
not factor through conditional independencies and (b) are amply instantiated, 
are the source of failure of polynomial time algorithms. 



14 



2. Interaction Models and 
Conditional Independence 



Systems involving a large number of variables interacting in complex ways are 
ubiquitous in the mathematical sciences. These interactions induce dependen- 
cies between the variables. Because of the presence of such dependencies in a 
complex system with interacting variables, it is not often that one encounters in- 
dependence between variables. However, one frequently encounters conditional 
independence between sets of variables. Both independence and conditional in- 
dependence among sets of variables have been standard objects of study in 
probability and statistics. Speaking in terms of algorithmic complexity, one of- 
ten hopes that by exploiting the conditional independence between certain sets 
of variables, one may avoid the cost of enumeration of an exponential number 
of hypothesis in evaluating functions of the distribution that are of interest. 



2.1 Conditional Independence 

We first fix some notation. Random variables will be denoted by upper case 
letters such as X, Y, Z, etc. The values a random variable takes will be denoted 
by the corresponding lower case letters, such as x,y,z. Throughout this work, 
we assume our random variables to be discrete unless stated otherwise. We 
may also assume that they take values in a common finite state space, which 
we usually denote by A following physics convention. We denote the probabil- 
ity mass functions of discrete random variables X, Y, Z by Px{x), -Py(y), Pz{z) 
respectively. Similarly, Px,y{x, y) will denote the joint mass of (X, Y), and so 
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on. We drop subscripts on the P when it causes no confusion. We freely use the 
term "distribution" for the probability mass function. 

The notion of conditional independence is central to our proof. The intuitive 
definition of the conditional independence of X from Y given Z is that the con- 
ditional distribution of X given (Y, Z) is equal to the conditional distribution 
of X given Z alone. This means that once the value of Z is given, no further 
information about the value of X can be extracted from the value of Y. This 
is an asymmetric definition, and can be replaced by the following symmetric 
definition. Recall that X is independent of Y if 

P{x,y)^P{x)P{y). 

Definition 2.1. Let notation be as above. X is conditionally independent of Y 
given Z, written XAY \ Z, if 

P{x,y\ z) = P{x I z)P{y \ z), 

The asymmetric version which says that the information contained in Y is 
superfluous to determining the value of X once the value of Z is known may 
be represented as 

P{x \ y,z) — P{x I z). 

The notion of conditional independence pervades statistical theory [Daw79, 
Daw80]. Several notions from statistics may be recast in this language. 

Example 2.2. The notion of sufficiency may be seen as the presence of a cer- 
tain conditional independence [Daw 79]. A sufficient statistic T in the problem 
of parameter estimation is that which renders the estimate of the parameter in- 
dependent of any further information from the sample X. Thus, if © is the 
parameter to be estimated, then T is a sufficient statistic if 

p{e\x) = p{e\t). 

Thus, all there is to be gained from the sample in terms of information about 
© is already present in T alone. In particular, if © is a posterior that is being 
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computed by Bayesian inference, then the above relation says that the posterior 
depends on the data X through the value of T alone. Clearly, such a statement 
would lead to a reduction in the complexity of inference. 

2.2 Conditional Independence in Undirected Graph- 
ical Models 

Graphical models offer a convenient framework and methodology to describe 
and exploit conditional independence between sets of variables in a system. 
One may think of the graphical model as representing the family of distribu- 
tions whose law fulfills the conditional independence statements made by the 
graph. A member of this family may satisfy any number of additional condi- 
tional independence statements, but not less than those prescribed by the graph. 
In general, we will consider graphs Q — (V, -B) whose n vertices index a set of 
n random variables (Xi, . . . , The random variables all take their values 
in a common state space A. The random vector (Xi, . . . , X^) then takes values 
in a configuration space fin = A". We will denote values of the random vector 
(Xi, . . . , Xn) simply by a; = (xi, . . . , Xn)- The notation Xv\i will denote the set 
of variables excluding those whose indices lie in the set /. Let P be a proba- 
bility measure on the configuration space. We will study the interplay between 
conditional independence properties of P and its factorization properties. 

There are, broadly, two kinds of graphical models: directed and undirected. 
We first consider the case of undirected models. Fig. 2.1 illustrates an undirected 
graphical model with ten variables. 

Random Fields and Markov Properties 

Graphical models are very useful because they allow us to read off conditional 
independencies of the distributions that satisfy these models from the graph 
itself. Recall that we wish to study the relation between conditional indepen- 
dence of a distribution with respect to a graphical model, and its factorization. 
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Figure 2.1: An undirected graphical model. Each vertex represents a random 
variable. The vertices in set A are separated from those in set B by set C. For 
random variables to satisfy the global Markov property relative to this graph- 
ical model, the corresponding sets of random variables must be conditionally 
independent. Namely, AJLB \ C. 

Towards that end, one may write increasingly stringent conditional indepen- 
dence properties that a set of random variables satisfying a graphical model 
may possess, with respect to the graph. In order to state these, we first define 
two graph theoretic notions — those of a general neighborhood system, and of 
separation. 

Definition 2.3. Given a set of variables S known as sites, a neighborhood system 
Ms on S is a collection of subsets {Mi : I <i <n} indexed by the sites in S that 
satisfy 

1. a site is not a neighbor to itself (this also means there are no self -loops in 
the induced graph): Sj ^ Mi, and 

2. the relationship of being a neighbor is mutual: Sj G Mj Sj G Mi. 

In many applications, the sites are vertices on a graph, and the neighborhood 
system Mi is the set of neighbors of vertex Si on the graph. We will often be 
interested in homogeneous neighborhood systems of 5* on a graph in which, for 
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each Si e S, the neighborhood J\fi is defined as 

Gi := {sj e S : Sj) < r}. 

Namely, in such neighborhood systems, the neighborhood of a site is simply 
the set of sites that lie in the radius r ball around that site. Note that a nearest 
neighbor system that is often used in physics is just the case of r = 1. We will need 
to use the general case, where r will be determined by considerations from logic 
that will be introduced in the next two chapters. We will use the term "variable" 
freely in place of "site" when we move to logic. 

Definition 2.4. Let A,B,Che three disjoint subsets of the vertices V oia graph 
G. The set C is said to separate A and B if every path from a vertex in A to a 
vertex in B must pass through C. 

Now we return to the case of the vertices indexing random variables (Xi , . . . , X„) 
and the vector (Xi, . . . , X„) taking values in a configuration space Qn- A proba- 
bility measure P on 0„ is said to satisfy certain Markov properties with respect 
to the graph when it satisfies the appropriate conditional independencies with 
respect to that graph. We will study the following two Markov properties, and 
their relation to factorization of the distribution. 

Definition 2.5. 1. The local Markov property. The distribution Xi (for every i) 
is conditionally independent of the rest of the graph given just the vari- 
ables that lie in the neighborhood of the vertex. In other words, the influ- 
ence that variables exert on any given variable is completely described by 
the influence that is exerted through the neighborhood variables alone. 

2. The global Markov property. For any disjoint subsets A, B,C ofV such that 
C separates A from B in the graph, it holds that 

AALB I C. 

We are interested in distributions that do satisfy such properties, and will 
examine what effect these Markov properties have on the factorization of the 
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distributions. For most applications, this is done in the context of Markov random 
fields. 

We motivate a Markov random field with the simple example of a Markov 
chain {Xn : n > 0}. The Markov property of this chain is that any variable in 
the chain is conditionally independent of all other variables in the chain given 
just its immediate neighbors: 

XnM{xk : k ^ {n-l,n,n + l}\ X^-i, ^n+i}- 

A Markov random field is the natural generalization of this picture to higher 
dimensions and more general neighborhood systems. 

Definition 2.6. The collection of random variables Xi, . . . , X„ is a Markov ran- 
dom field with respect to a neighborhood system on Q if and only if the following 
two conditions are satisfied. 

1. The distribution is positive on the space of configurations: P{x) > 0 for x e 

2. The distribution at each vertex is conditionally independent of all other 
vertices given just those in its neighborhood: 

PiXi\Xv\i)^PiXi\XM,) 

These local conditional distributions are known as local characteristics of 
the field. 

The second condition says that Markov random fields satisfy the local Markov 
property with respect to the neighborhood system. Thus, we can think of inter- 
actions between variables in Markov random fields as being characterized by 
"piecewise local" interactions. Namely, the influence of far away vertices must 
"factor through" local interactions. This may be interpreted as: 

The influence of far away variables is limited to that which is transmit- 
ted through the interspersed intermediate variables — there is no "direct" 
influence of far away vertices beyond that which is factored through such 
intermediate interactions. 
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However, through such local interactions, a vertex may influence any other ar- 
bitrarily far away. Notice though, that this is a considerably simpler picture 
than having to consult the joint distribution over all variables for all interac- 
tions, for here, we need only know the local joint distributions and use these to 
infer the correlations of far away variables. We shall see in later chapters that 
this picture, with some additional caveats, is at the heart of polynomial time 
computations. 

Note the positivity condition on Markov random fields. With this positivity 
condition, the complete set of conditionals given by the local characteristics of 
a field determine the joint distribution [Bes74]. 

Markov random fields satisfy the global Markov property as well. 

Theorem 2.7. Markov random fields with respect to a neighborhood system satisfy the 
global Markov property with respect to the graph constructed from the neighborhood 
system. 

Markov random fields originated in statistical mechanics [Dob68], where 
they model probability measures on configurations of interacting particles, such 
as Ising spins. See [KS80] for a treatment that focusses on this setting. Their lo- 
cal properties were later found to have applications to analysis of images and 
other systems that can be modelled through some form of spatial interaction. 
This field started with [Bes74] and came into its own with [GG84] which ex- 
ploited the Markov-Gibbs correspondence that we will deal with shortly. See 
also [Li09]. 

2.2.1 Gibbs Random Fields and the Hammersley-Cliff ord The- 
orem 

We are interested in how the Markov properties of the previous section trans- 
late into factorization of the distribution. Note that Markov random fields are 
characterized by a local condition — namely, their local conditional indepen- 
dence characteristics. We now describe another random field that has a global 
characterization — the Gibbs random field. 
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Definition 2.8. A Gibbs random field (or Gibbs distribution) with respect to a neigh- 
borhood system J\fg on the graph ^ is a probability measure on the set of con- 
figurations Q„ having a representation of the form 



1. Z is the partition function and is a normalizing factor that ensures that the 
measure sums to unity. 



Evaluating Z explicitly is hard in general since it is a summation over each 
of the A" configurations in the space. 

2. T is a constant known as the "Temperature" that has origins in statistical 
mechanics. It controls the sharpness of the distribution. At high tempera- 
tures, the distribution tends to be uniform over the configurations. At low 
temperatures, it tends towards a distribution that is supported only on the 
lowest energy states. 

3. U {x) is the "energy" of configuration x and takes the following form as a 
sum 



over the set of cliques CofQ. The functions : c e C are the clique poten- 
tials such that the value of Vc{x) depends only on the coordinates of x that 
lie in the clique c. These capture the interactions between vertices in the 
clique. 

Thus, a Gibbs random field has a probability distribution that factorizes into 
its constituent "interaction potentials." This says that the probability of a con- 
figuration depends only on the interactions that occur between the variables, 
broken up into cliques. For example, let us say that in a system, each particle 



P{xi, ...,xn)^- exp( —), 



where 




U{x) = J2Vc{x). 
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interacts with only 2 other particles at a time, (if one prefers to think in terms 
of statistical mechanics) then the energy of each state would be expressible as a 
sum of potentials, each of whom had just three variables in its support. Thus, 
the Gibbs factorization carries in it a faithful representation of the underlying 
interactions between the particles. This type of factorization obviously yields 
a "simpler description" of the distribution. The precise notion is that of inde- 
pendent parameters it takes to specify the distribution. Factorization into con- 
ditionally independent interactions of scope k means that we can specify the 
distribution in 0(7*^) parameters rather than 0(7"). We will return to this at the 
end of this chapter. 

Definition 2.9. Let P be a Gibbs distribution whose energy function U{x) ~ 
Ylicec ^c{x). The support of the potential Vc is the cardinality of the clique c. The 
degree of the distribution P, denoted by deg(P), is the maximum of the supports 
of the potentials. In other words, the degree of the distribution is the size of the 
largest clique that occurs in its factorization. 

One may immediately see that the degree of a distribution is a measure of 

the complexity of interactions in the system since it is the size of the largest set 
of variables whose interaction cannot be split up in terms of smaller interactions 
between subsets. One would expect this to be the hurdle in efficient algorithmic 
applications. 

The Hammersley-Clifford theorem relates the two types of random fields. 

Theorem 2.10 (Hammersley-Clifford). X is Markov random field with respect to a 
neighborhood system Ng on the graph Q if and only if it is a Gibbs random field with 
respect to the same neighborhood system. 

The theorem appears in the unpublished manuscript [HC71] and uses a cer- 
tain "blackening algebra" in the proof. The first published proofs appear in 
[Bes74] and [Mou74]. 

Note that the condition of positivity on the distribution (which is part of 
the definition of a Markov random field) is essential to state the theorem in 
full generality. The following example from [Mou74] shows that relaxing this 
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condition allows us to build distributions having the Markov property, but not 
the Gibbs property. 

Example 2.11. Consider a system of four binary variables {Xi, X2, X3, X4}. 
Each of the following combinations have probability 1/8, while the remaining 
combinations are disallowed. 

(0,0,0,0) (1,0,0,0) (1,1,0,0) (1,1,1,0) 
(0,0,0,1) (0,0,1,1) (0,1,1,1) (1,1,1,1). 

We may check that this distribution has the global Markov property with re- 
spect to the 4 vertex cycle graph. Namely we have 

X1XX3 1 X2, X4 and X2XX4 I Xi, X3. 

However, the distribution does not factorize into Gibbs potentials. 




Figure 2.2: A factor graph showing the three clause 3-SAT formula (Xi V X4 V 
^Xg) A (^Xi V X2 V ^Xs) A (X4 V X5 V Xe). A dashed line indicates that the 
variable appears negated in the clause. 
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The distribution modelled by this factor graph will show a factorization as 
follows 

p{xi, . . . ,X(i) = ^fl{xi,X4,XQ)ip2{xi,X2,X3)ip{x4,X5,X6), (2.1) 

where Z= ^ (fii{xi,Xi,XG)(fi2{xi,X2,X3)(fi{x4,X5,XQ). (2.2) 

Xl^ ;• • • fXQ 

Factor graphs offer a finer grained view of factorization of a distribution 
than Bayesian networks or Markov networks. One should keep in mind that 
this factorization is (in general) far from being a factorization into conditionals 
and does not express conditional independence. The system must embed each 
of these factors in ways that are global and not obvious from the factors. This 
global information is contained in the partition function. Thus, in general, these 
factors do not represent conditionally independent pieces of the joint distribu- 
tions. In summary, the factorization above is not the one what we are seeking — 
it does not imply a series of conditional independencies in the joint distribution. 

Factor graphs have been very useful in various applications, most notably 
perhaps in coding theory where they are used as graphical models that un- 
derlie various decoding algorithms based on forms of belief propagation (also 
known as the sum-product algorithm) that is an exact algorithm for computing 
marginals on tree graphs but performs remarkably well even in the presence of 
loops. See [KFaL98] and [AMOO] for surveys of this field. As might be expected 
from the preceding comments, these do not focus on conditional independence, 
but rather on algorithmic applications of local features (such as locally tree like) 
of factor graphs. 

A Hammersley-Clifford type theorem holds over the completion of a factor 
graph. A clique in a factor graph is a set of variable nodes such that every pair 
in the set is connected by a function node. The completion of a factor graph is 
obtained by introducing a new function node for each clique, and connecting 
it to all the variable nodes in the clique, and no others. Then, a positive distri- 
bution that satisfies the global Markov property with respect to a factor graph 
satisfies the Gibbs property with respect to its completion. 
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2.4 The Markov-Gibbs Correspondence for Directed 
Models 

Consider jfirst a directed acyclic graph (DAG), which is simply a directed graph 
without any directed cycles in it. Some specijfic points of additional terminology 
for directed graphs are as follows. If there is a directed edge from x to y, we say 
that X is a parent of y, and y is the child of x. The set of parents of x is denoted 
by pa(a;), while the set of children of x is denoted by ch(a). The set of vertices 
from whom directed paths lead to x is called the ancestor set of x and is denoted 
an(x). Similarly, the set of vertices to whom directed paths from x lead is called 
the descendant set of x and is denoted de{x). Note that DAGs is allowed to have 
loops (and loopy DAGs are central to the study of iterative decoding algorithms 
on graphical models). Finally, we often assume that the graph is equipped with 
a distance function ■) between vertices which is just the length of the shortest 
path between them. A set of random variables whose interdependencies may 
be represented using a DAG is known as a Bayesian network or a directed Markov 
field. The idea is best illustrated with a simple example. 

Consider the DAG of Fig. 2.3 (left). The corresponding factorization of the 
joint density that is induced by the DAG model is 

p{xi, ...,Xq)= p{xi)p{x2)p{x2.)p{xa I Xi)p{x^ I X2, Xs, Xi). 

Thus, every joint distribution that satisfies this DAG factorizes as above. 

Given a directed graphical model, one may construct an undirected one by 
a process known as moralization. In moralization, we (a) replace a directed edge 
from one vertex to another by an undirected one between the same two vertices 

and (b) "marry" the parents of each vertex by introducing edges between each 
pair of parents of the vertex at the head of the former directed edge. The process 
is illustrated in the figure below. 

In general, if we denote the set of parents of the variable Xi by pa(a;j), then 
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Figure 2.3: The moralization of the DAG on the left to obtain the moralized 
undirected graph on the right. 



the joint distribution of ( ) factorizes as 



TV 



n 



) = Ylp{xn I pa„). 



n=l 



We want, however, is to obtain a Markov-Gibbs equivalence for such graphi- 
cal models in the same manner that the Hammersley-Cliff ord theorem provided 
for positive Markov random fields. We have seen that relaxing the positivity 
condition on the distribution in the Hammersley-Clifford theorem (Thm. 2.10) 
cannot be done in general. In some cases however, one may remove the positiv- 
ity condition safely. In particular, [LDLL90] extends the Hammersley-Clifford 
correspondence to the case of arbitrary distributions (namely, dropping the pos- 
itivity requirement) for the case of directed Markov fields. In doing so, they sim- 
plify and strengthen an earlier criterion for directed graphs given by [KSC84]. 
We will use the result from [LDLL90], which we reproduce next. 

Definition 2.12. A measure p admits a recursive factorization according to graph 
Q if there exist non-negative functions, known as kernels, k'"(., .) for v E V de- 
fined on A X a' where the first factor is the state space for and the second 
for Xpa(„), such that 
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and 

p = /.// where f{x) = JJ A';''(x„, a;pa(„)). 

In this case, the kernels k'"{., Xpa(u)) are the conditional densities for the dis- 
tribution of Xy conditioned on the value of its parents Xpa(t,) = Xpa.{v)- Now let 
Q"^ be the moral graph corresponding to Q. 

Theorem 2.13. If p admits a recursive factorization according to Q, then it admits a 
factorization (into potentials) according to the moral graph Q'^. 

D-separation 

We have considered the notion of separation on undirected models and its ef- 
fect on the set of conditional independencies satisfied by the distributions that 
factor according to the model. For directed models, there is an analogous no- 
tion of separation known as D-separation. The notion is what one would expect 
intuitively if one views directed models as representing "flows" of probabilistic 
influence. 

We simply state the property and refer the reader to [KF09, §3.3.1] and [Bis06, 
§8.2.2] for discussion and examples. Let A,B, and C be sets of vertices on a 
directed model. Consider the set of all directed paths coming from a node in A 
and going to a node in B. Such a path is said to be blocked if one of the following 
two scenarios occurs. 

1. Arrows on the path meet head-to-tail or tail-to-tail at a node in C. 

2. Arrows meet head-to-head at a node, and neither the node nor any of its 
descendants is in C. 

If all paths from Ato B are blocked as above, then C is said to D-separate A 
from B, and the joint distribution must satisfy AALB \ C. 
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2.5 X-maps and D-maps 

We have seen that there are two broad classes of graphical models — undirected 
and directed — which may be used to represent the interaction of variables 
in a system. The conditional independence properties of these two classes are 
obtained differently. 

Definition 2.14. A graph (directed or undirected) is said to be a V-map ('depen- 
dencies map') for a distribution if every conditional independence statement of 

the form A^LB \ C for sets of variables A, B, and C that is satisfied by the distri- 
bution is reflected in the graph. Thus, a completely disconnected graph having 
no edges is trivially a V-map for any distribution. 

A X>-map may express more conditional independencies than the distribu- 
tion possesses. 

Definition 2.15. A graph (directed or undirected) is said to be a I-map ('inde- 
pendencies map') for a distribution if every conditional independence state- 
ment of the form AALB \ C for sets of variables A, B, and C that is expressed 
by the graph is also satisfied by the distribution. Thus, a completely connected 
graph is trivially a X-map for any distribution. 

A X-map may express less conditional independencies than the distribution 
possesses. 

Definition 2.16. A graph that is both an X-map and a X>-map for a distribution 
is called its V-map ('perfect map'). 

In other words a P-map expresses precisely the set of conditional indepen- 
dencies that are present in the distribution. 

Not all distributions have P-maps. Indeed, the class of distributions having 
directed T'-maps is itself distinct from the class having undirected T'-maps and 
neither equals the class of all distributions (see [Bis06, §3.8.4] for examples). 
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3. Distributions with 
poly (log n)-Parametrization 



We now come to a central theme in our work. Consider a system of n binary 
covariates {Xi, . . . , Xn). To specify their joint distribution p{xi, . . ., x„) in the 
absence of any additional information, we would have to give the probability 
mass function at each of the 2" configurations that these n variables can take 
jointly. The only constraint given on these probability masses is that they must 
sum up to 1. Thus, given the function value at 2" — 1 configurations, we could 
find that at the remaining configuration. This means that in the absence of any 
additional information, n covariates require 2" — 1 parameters to specify their 
joint distribution. Thus, it takes exponentially many in n parameters to specify 
a "true" joint distribution of n covariates. This statement can be made more 
precise — the typical joint distribution on n variates requires 0(2") parameters 
for its specification. 

In light of the above, a joint distribution that requires only 2P°^y*^'°s"^ param- 
eters to specify would seem quite unusual. We would intuitively expect it to be 
"much simpler" in some way than the typical joint distribution on n variates. 
Indeed, because of the exponent of poly (log n), we would expect that it would 
be "somewhat like" a joint distribution on only poly (log n) covariates. In other 
words — distributions on n variates but requiring only 2P°iy('°s") parameters for 
their specification are like the typical distribution on poly (log n) variates. We 
shall refer to such distributions as having poly(log n)-parametrization. 

Let us take an extreme case of such a "simple" joint distribution. Take the 
case of n covariates, except that we are provided with one critical piece of extra 
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mformation — that the n variates are independent of each other. In that case, 
we would need 1 parameter to specify each of their individual distributions — 
namely, the probability that it takes the value 1. These n parameters then spec- 
ify the joint distribution simply because the distribution factorizes completely 
into factors whose scopes are single variables (namely, just the p{xi)), as a re- 
sult of the independence. Thus, we go from exponentially many independent 
parameters to linearly many if we know that the variates are independent. 

Let us consider another extreme case of such a distribution. Consider the 
distribution on n variates that is non-zero only at (0, 0, ... , 0) and (1, 1, ... , 1). 
Here, the variates are highly correlated. But once again, we require only two 
parameters to specify the distribution. In this case, it is because the distribu- 
tion is supported on only two out of a possible 2" values. In other words, it is 
severely limited by the small number of joint values the covariates take. 

In both cases above, the distribution on n covariates required far fewer pa- 
rameters to specify than the typical n variate distribution does. 

In order to state the typical case of a n variate distribution, we make the 
following definition. 

Definition 3.1. A distribution on n variates will be called ample if it is supported 
on c" joint values for c > 1. 

In other words, ample distributions take the typical number of joint values. 

Definition 3.2. A distribution on n variates will be said to have irreducible 0{n) 
correlations if there exist correlations between 0{n) variates that do not permit 
factorization into smaller scopes through conditional independencies. 

It is distributions that possess both these properties that are problematic for 
polynomial time algorithms. We will see that distributions constructed by poly- 
nomial time algorithms can have one or the other property, but not both. Note 
that distributions having both these properties require 0(2") independent pa- 
rameters to specify. There is neither factorization, nor limited support, that will 
permit more economical parametrization. 
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This brings us to a key motivating question: What if n covariates had a joint 
distribution that required only exponential in poly(log n) many parameters to 
specify? When would such a distribution arise, and what would be its limita- 
tions? This question is really the heart of P = NP. Indeed, all the machinery 
we build and use in this work really takes us to the following insight: Polyno- 
mial time computations build distributions of solutions that can be parameterized us- 
ing only exponential in poly (log n) many parameters. Namely, they have poly (log n)- 
parametrization. In contrast, in the hard phases of NP -complete problems like k-SAT, 
the distribution of solutions requires exponentially many parameters to specify. In par- 
ticular, the distribution of solutions in the hard phases of NP-complete prob- 
lems displays two properties 

1. The variates are as far from being independent as possible — they inter- 
act with each other 0{n) at a time, with no possibility for factorization 
into conditional independencies. In other words, the distribution has irre- 
ducible 0{n) correlations. 

2. The distribution is ample. 

Note that both conditions are required. It is not only long range correlations, 
but (a) the non-factorizability of such correlations and (b) ampleness under 
such non-f actorizable correlations that characterizes the solution spaces in hard 
phases of NP-complete problems. 

3.1 Two Kinds of poly(log n)-parameterizations 

We have seen that distributions on n variates that are poly(log n)-parametrizable 
are very atypical. When do they arise? They can be studied in two categories, 
both of which will correspond to polynomial time algorithms. 

3.1.1 Range Limited Interactions 

As noted earlier, it is not often that complex systems of n interacting variables 
have complete independence between some subsets. What is far more frequent 
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is that there are conditional independencies between certain subsets given some 
intermediate subset. In this case, the joint will factorize into factors each of 
whose scope is a subset of (Xi, . . . , X^). If the factorization is into condition- 
ally independent factors, each of whose scope is of size at most k , then we can 
parametrize the joint distribution with at most independent parameters. 
We should emphasize that the factors must give us conditional independence 
for this to be true. For example, factor graphs give us a factorization, but it is, 
in general, not a factorization into conditional independents, and so we cannot 
conclude anything about the number of independent parameters by just exam- 
ining the factor graph. From our perspective, a major feature of directed graphi- 
cal models is that their factorizations are already globally normalized once they 
are locally normalized, meaning that there is a recursive factorization of the 
joint into conditionally independent pieces. The conditional independence in 
this case is from all non-descendants, given the parents. Therefore, if each node 
has at most k parents, we can parametrize the distribution using at most n2^ 
independent parameters. We may also moralize the graph and see this as a fac- 
torization over cliques in the moralized graph. Note that such a factorization 
(namely, starting from a directed model and moralizing) holds even if the dis- 
tribution is not positive in contrast with those distributions which do not factor 
over directed models and where we have to invoke the Hammersley-Clifford 
theorem to get a similar factorization. See [KF09] for further discussion on pa- 
rameterizations for directed and undirected graphical models. 

Our proof scheme requires us to distinguish distributions based on the size 
of the irreducible direct interactions between subsets of the covariates. Namely, 
we would like to distinguish distributions where there are 0{n) such covariates 
whose joint interaction cannot be factored through smaller interactions (having 
less than 0{n) covariates) chained together by conditional independencies. We 
would like to contrast such distributions from others which can be so factored 
through factors having only poly (log n) variates in their scope. The measure that 
allows us to make this distinction is the number of independent parameters it 
takes to specify the distribution. When the size of the smallest irreducible inter- 
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Figure 3.1: A range limited joint distribution on n covariates that has poly (log n)- 
parametrization. Although interactions between variables may be ample for 
their range, but their range is limited to poly (log n). 
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actions is 0{n), then we need 0(c") parameters where c > 1. On the other hand, 
if we were able to demonstrate that the distribution factors through interactions 
which always have scope poly(log n), then we would need only 0(0^°^^^^°^"'^) pa- 
rameters. See Fig. 3.1 

Let us consider the example of a Markov random field. By Hammer sley- 
Clifford, it is also a Gibbs random field over the set of maximal cliques in the 
graph encoding the neighborhood system of the Markov random field. This 
Gibbs field comes with conditional independence assurance, and therefore, we 
have an upper bound on the number of parameters it takes to specify the dis- 
tribution. Namely, it is just X]cec 2'^'- Thus, if at most k < n variables interact 
directly at a time, then the largest clique size would be k, and this would give 
us a more economical parameterization than the one which requires 2" — 1 pa- 
rameters. 

In Chapter 5, we will build machinery that shows that if a problem lies in P 

as a result of a range limited algorithm (like monadic LFP), then the factoriza- 
tion of the distribution of solutions to that problem causes it to have economi- 
cal parametrization, precisely because variables do not interact all at once, but 
rather in smaller subsets in a directed manner that gives us conditional inde- 
pendencies between sets that are of size poly(logn). 

Note that the case where all n variates are independent falls into the range 
limited category with range being one. The resulting distribution is ample. 

3.1.2 Value Limited Interactions 

In the previous section we saw the first type of interaction between n covari- 
ates that can be parametrized by just poly (log n) independent parameters. This 
was the case where the n variates interact directly only poly(log n) at a time, and 
such interactions are chained together through conditional independencies. In 
this section, we will see another such limited interaction, where the n variates 
do interact directly 0{n) at a time, but they are restricted to taking only 0^°^^^'°^") 
many distinct values (see Fig. 3.2). One sees immediately that the underlying lim- 
itation in both this case and the previous is common — the set of n covariates 
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do not take 2" different values with extensive 0(n) correlations that do not fac- 
tor through conditional independencies like a "true" (or more precisely, typical) 
joint distribution of n variates. Instead, in both cases, the n covariates behave in 
ways that is similar to a system of poly(log n) covariates. Namely, in both cases, 
their "jointness" resembles a system of poly (log n) covariates. 

How do we precisely state this property? Through the notion of indepen- 
dent parameters. We will measure the jointness of a distribution by the number 
of independent parameters required to specify it. A "true" joint distribution 
takes 0(c"), c > 1 independent parameters to specify. On the other hand, both 
range and value limited interactions require only 0(0^°^^^^°^"^) independent pa- 
rameters to specify. This is the crux of the P = NP question, as we shall see. In 
particular, we shall see that in the hard phases of problems such as /c-SAT for 
A; > 8, 0(cP°'y(^°s")) independent parameters simply will not suffice to explain 
the behavior of the solution space of the problem. We will recall this behav- 
ior in some detail in Chapter 6. We should stress that this behavior has been 
rigorously shown to hold for some phases of fc-S AT for high values of k. It is in 
these phases that our separation of complexity classes can be demonstrated, not 
elsewhere. We should also point out that once we have isolated the precise no- 
tion that is at the heart of polynomial time computation — namely poly (log n)- 
parametrizability of the space of solutions — several apparent issues resolve 
themselves. Take the case of clustering in XORSAT, for instance. We only 
need note that the linear nature of XORSAT solution spaces mean there is a 
poly (log n)-parametrization (the basis provides this for linear spaces). The core 
issue is that of the number of independent parameters it takes to specify the 
distribution of the entire space of solutions. 

In both cases — range limited and value limited interactions — the sys- 
tem of n covariates behaves as though it was a system of only poly (log n) 
covariates. In the case of range limited, this is because the covariates only 
jointly vary with poly (log n) other variates at a time. In the case of value 
limited interactions, this is because though 0{n) variates vary jointly, they 
only take 2P°^y(^°g'*) joint values. Thus, in both cases, the joint distribution 
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has a very economical parametrization using only 2P°^y(^°s") independent 
parameters. 

In later chapters, we will build machinery to see that polynomial time LFP 
algorithms can capture either range or value limited behaviors, but not the joint 
behavior of a "true" joint distribution of n covariates. 

It is also useful to notice that neither type of limitation implies the other. 
For instance, n independent variates are range limited, but not value limited. 
Whereas the distribution supported on the all 1 and all 0 tuple is value limited, 
but not range limited. Regimes of problems where the distributions of solutions 
are neither value limited nor range limited cause the failure of polynomial time 
algc 




Figure 3.2: A value limited joint distribution on n covariates that has poly (log n)- 
parametrization. Although interactions between variables are 0{n) at a time, 
they do not display ampleness in their joint distribution. 
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3.1.3 On the Atypical Nature of poly (log n) -parameterization 

We briefly mentioned earlier that the typical member of the space of distribu- 
tions on n covariates requires 0(2") parameters. Note that this is a statistical 
statement. Namely, if we picked a n variable distribution at random, with 
high likelihood we would get a distribution that required 0(2") parameters to 
specify. In other words, with high likelihood, we would not get a poly (log n)- 
parametrizable distribution. This observation may be used to state results about 
average case complexity in hard phases of random A:-S AT. In many ways, these 
hard phases are simply typical, nothing more. The solution space shows the 
behavior of a typical joint distribution on n covariates in that it is ample and 
correlated. It is polynomial time solution spaces that are atypical for n variate 
distributions in that they are either not ample (the value limited case) or they are 
not correlated solidly enough (the range limited case, where they admit Gibbs 
factorizations into smaller potentials). 

This short section owes its existence to Leonid Levin and Avi Wigderson, 
both of whom asked us whether our methods could be used to make statements 
about average case complexity. We will return to this issue in future versions of 
this paper or in the manuscript [Deo 10] which is under preparation. 

3.1.4 Our Treatment of Range and Value Limited Distributions 

The two types of distributions that we have mentioned above are only superfi- 
cially dissimilar. In both cases, the range of behaviors of the n covariates can be 
parametrized with the number of independent parameters it takes to specify a 
joint distribution of only poly (log n) covariates. For purposes of pedagogy, we 
will disregard this superficial dissimilarity and provide a full treatment of the 
range limited case. We can even think of the value limited behavior as a type of 
range limited behavior where, even though a covariate sees 0{n) other covari- 
ates, it only utilizes poly (log n) amount of the information in them in order to 
make its decision. 

We end this chapter by tying poly (log n)-parameterizations to a Markov or 



38 



3. DISTRIBUTIONS WITH poly(LOG iV)-PARAMETRIZATION 



39 



(equivalently for directed) Gibbs models. Once again, consider two kinds of 
poly (log n)-parameterizations — range limited and value limited. A range lim- 
ited parametrization would correspond to a Gibbs field whose potentials are 
specified over maximum cliques of size poly(logn). A value limited parameter- 
ization could have maximum cliques of size 0{n), but the number of parameters 
for such a clique would only be 2P°'y(^°s") instead of the possible 2^("\ In either 
case, the random field would have poly (log n) -parametrization. See Figs. 3.1 
and 3.2. 
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4. Logical Descriptions of 
Computations 



Work in finite model theory and descriptive complexity theory — a branch of 
finite model theory that studies the expressive power of various logics in terms 
of complexity classes — has resulted in machine independent characterizations 
of various complexity classes. In particular, over ordered structures, there is 
a precise and highly insightful characterization of the class of queries that are 
computable in polynomial time, and those that are computable in polynomial 
space. In order to keep the treatment relatively complete, we begin with a brief 
precis of this theory. Readers from a finite model theory background may skip 
this chapter. 

We quickly set notation. A vocabulary, denoted by a, is a set consisting of 
finitely many relation and constant symbols, 

C = {Rl, ■ ■ ■ , Rm, Cl, . . . , Cs). 

Each relation has a fixed arity. We consider only relational vocabularies in that 
there are no function symbols. This poses no shortcomings since functions may 
be encoded as relations. A a-structure 21 consists of a set A which is the universe 
of 21, interpretations R^ for each of the relation symbols in the vocabulary, and 
interpretations for each of the constant symbols in the vocabulary. Namely, 

21 = {A, Rf, . . . , i?^, cf , . . . , cf). 

An example is the vocabulary of graphs which consists of a single relation 
symbol having arity two. Then, a graph may be seen as a structure over this 
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vocabulary, where the universe is the set of nodes, and the relation symbol is 
interpreted as an edge. In addition, some applications may require us to work 
with a graph vocabulary having two constants interpreted in the structure as 
source and sink nodes respectively. 

We also denote by (7„ the extension of cr by n additional constants, and de- 
note by (21, a) the structure where the tuple a has been identified with these 
additional constants. 

4.1 Inductive Definitions and Fixed Points 

The material in this section is standard, and we refer the reader to [Mos74] for 
the first monograph on the subject, and to [EF06, Lib04] for detailed treatments 
in the context of finite model theory. See [Imm99] for a text on descriptive com- 
plexity theory. Our treatment is taken mostly from these sources, and stresses 
the facts we need. 

Inductive definitions are a fundamental primitive of mathematics. The idea 
is to build up a set in stages, where the defining relation for each stage can be 
written in the first order language of the underlying structure and uses elements 
added to the set in previous stages. In the most general case, there is an under- 
lying structure 21 = {A,Ri,. . . , R^) and a formula 

0(S',x) = 0(5, xi, . . . ,a;„) 

in the first-order language of 2t. The variable 5* is a second-order relation vari- 
able that will eventually hold the set we are trying to build up in stages. At the 
stage of the induction, denoted by /|, we insert into the relation -S" the tuples 
according to 

xe/l^^dj/;, x). 

We will denote the stage that a tuple enters the relation in the induction defined 
by 0 by I ■ |^. The decomposition into its various stages is a central characteristic 
of inductively defined relations. We will also require that 0 have only posi- 
tive occurrences of the n-ary relation variable S, namely all occurrences of S be 
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within the scope of an even number of negations. Such inductions are called 
positive elementary. In the most general case, a transfinite induction may result. 
The least ordinal k at which — 7^+^ is called the closure ordinal of the induc- 
tion, and is denoted by When the underlying structures are finite, this is 
also known as the inductive depth. Note that the cardinality of the ordinal k is at 
most 1^1". 

Finally, we define the relation 



Sets of the form axe known as fixed points of the structure. Relations that may 
be defined by 



for some choice of tuple a over A are known as inductive relations. Thus, induc- 
tive relations are sections of fixed points. 

Note that there are definitions of the set that are equivalent, but can be 
stated only in the second order language of 21. Note that the definition above is 

1. elementary at each stage, and 

2. constructive. 

We will use both these properties throughout our work. 

We now proceed more formally by introducing operators and their fixed 

points, and then consider the operators on structures that are induced by first 
order formulae. We begin by defining two classes of operators on sets. 

Definition 4.1. Let Ahea finite set, and V{A) be its power set. An operator F on 
^4 is a function F : V{A) V{A). The operator F is monotone if it respects subset 
inclusion, namely, for all subsets X, Yo{A,i{XC Y, then F{X) C F(y). The 
operator F is inflationary if it maps sets to their supersets, namely, X C F{X). 

Next, we define sequences induced by operators, and characterize the se- 
quences induced by monotone and inflationary operators. 




i?(x) <^ 7^(a, x) 
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Definition 4.2. Let F be an operator on A. Consider the sequence of sets F° , , . . . 
defined by 

(4.1) 

pi+i = p(^piy 

This sequence (F*) is called inductive if it is increasing, namely, if C F*+^ for 
all i. In this case, we define 

oo 

F°°:=|JF\ (4.2) 

j=0 

Lemma 4.3. /f F zs eff/zer monotone or inflationary, the sequence (F*) is inductive. 

Now we are ready to define fixed points of operators on sets. 

Definition 4.4. Let F be an operator on A. The set X C A is called a fixed point 
of F if F{X) = X. A fixed point X of F is called its least fixed point, denoted 
LFP(F), if it is contained in every other fixed point Y of F, namely, X C Y 
whenever F is a fixed point of F. 

Not all operators have fixed points, let alone least fixed points. The Tarski- 
Knaster guarantees that monotone operators do, and also provides two con- 
structions of the least fixed point for such operators: one "from above" and the 
other "from below." The latter construction uses the sequences (4.1). 

Theorem 4.5 (Tarski-Knaster). Let Fbea monotone operator on a set A. 

1. F has a least fixed point LFP(F) which is the intersection of all the fixed points 
of F. Namely, 

LFP(F) = f|{F:F = F(F)}. 

2. LFP(F) is also equal to the union of the stages of the sequence (F*) defined in 
(4.1). Namely, 

LFP(F) = |Jf^ = F°°. 

However, not all operators are monotone; therefore we need a means of con- 
structing fixed points for non-monotone operators. 
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Definition 4.6. For an inflationary operator F, the sequence is inductive, and 
hence eventually stabilizes to the fixed point F°°. For an arbitrary operator G, 
we associate the inflationary operator Ginfi defined by G]j^{Y) = YU G{Y). The 
set Ginfi"" is called the inflationary fixed point of G, and denoted by IFP(G'). 

Definition 4.7. Consider the sequence (F*) induced by an arbitrary operator F 
on A. The sequence may or may not stabilize. In the first case, there is a positive 
integer n such that = F", and therefore for all m > n, = F". In the 

latter case, the sequence F* does not stabilize, namely, for all n < 21^', F" 7^ 
^n+i fyfQ^^ define the partial fixed point of F, denoted PFP(F), as F" in the 
first case, and the empty set in the second case. 

4.2 Fixed Point Logics for P and PSPACE 

We now specialize the theory of fixed points of operators to the case where the 
operators are defined by means of first order formulae. 

Definition 4.8. Let cr be a relational vocabulary, and R a relational symbol of 
arity k that is not in a. Let <p{R, xi, . . . , Xn) — (p{R, x) be a formula of vocabulary 
a U {R}. Now consider a structure 21 of vocabulary a. The formula (p{R, x) 
defines an operator F^ : V{A'') V{A'') on A'' which acts on a subset X C A'' 
as 

F^{X)^{si\^^cp{X/R, a}, (4.3) 
where ^p{X/ R, a} means that R is interpreted as X in 93. 

We wish to extend FO by adding fixed points of operators of the form F^, 
where 0 is a formula in FO. This gives us fixed point logics which play a central 
role in descriptive complexity theory. 

Definition 4.9. Let the notation be as above. 

1. The logic FO(IFP) is obtained by extending FO with the following forma- 
tion rule: if ip{R, x) is a formula and t a A;-tuple of terms, then [I FPjj xV?!^, x)] (t) 
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is a formula whose free variables are those of t. The semantics are given 
by 

21 h [\fPRMR^^)]i^) iff a e IFP(F^). 

2. The logic FO(PFP) is obtained by extending FO with the following forma- 
tion rule: if x) is a formula and t a fc-tuple of terms, then [PFP/j xV'(-R, x)] (t) 
is a formula whose free variables are those of t. The semantics are given 
by 

21 h [PFPii,.(^(i?,x)](a) iff a e PFP(F^). 



We cannot define the closure of FO under taking least fixed points in the 
above manner without further restrictions since least fixed points are guaran- 
teed to exist only for monotone operators, and testing for monotonicity is un- 
decidable. If we were to form a logic by extending FO by least fixed points 
without further restrictions, we would obtain a logic with an undecidable syn- 
tax. Hence, we make some restrictions on the formulae which guarantee that 
the operators obtained from them as described by (4.3) will be monotone, and 
thus will have a least fixed point. We need a definition. 

Definition 4.10. Let notation be as earlier. Let 99 be a formula containing a rela- 
tional symbol R. An occurrence of R is said to be positive if it is under the scope 
of an even number of negations, and negative if it is under the scope of an odd 
number of negations. A formula is said to be positive in R if all occurrences of R 
in it are positive, or there are no occurrences of R at all. In particular, there are 
no negative occurrences of R in the formula. 

Lemma 4.11. Let notation be as earlier. If the formula (/^(i?, x) is positive in R, then 
the operator obtained from by construction (4.3) is monotone. 

Now we can define the closure of FO under least fixed points of operators 
obtained from formulae that are positive in a relational variable. 
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Definition 4.12. The logic FO(LFP) is obtained by extending FO with the fol- 
lowing formation rule: if (p{R, x) is a formula that is positive in the k-axy rela- 
tional variable R, and t is a A;-tuple of terms, then [LFP^^ x</'(^, x)] (t) is a formula 
whose free variables are those of t. The semantics are given by 

21 1= [LFPi^,x<^(i?,x)](a) iff a e LFP(F^). 

As earlier, the stage at which the tuple a enters the relation R is denoted by 
|a|^, and inductive depths are denoted by \<p^\. This is well defined for least 
fixed points since a tuple enters a relation only once, and is never removed 
from it after. In fixed points (such as partial fixed points) where the underlying 
formula is not necessarily positive, this is not true. A tuple may enter and leave 
the relation being built multiple times. 

Next, we informally state two well-known results on the expressive power 
of fixed point logics. First, adding the ability to do simultaneous induction 
over several formulae does not increase the expressive power of the logic, and 
secondly FO(IFP) = FO(LFP) over finite structures. See [Lib04, §10.3, p. 184] for 
details. 

We have introduced various fixed point constructions and extensions of first 
order logic by these constructions. We end this section by relating these log- 
ics to various complexity classes. These are the central results of descriptive 
complexity theory. 

Fagin [Fag74] obtained the first machine independent logical characteriza- 
tion of an important complexity class. Here, 3S0 refers to the restriction of 
second-order logic to formulae of the form 

3Xi ■ ■ ■ 3X^(^, 

where </? does not have any second-order quantification. 
Theorem 4.13 (Fagin). 

3S0 = NP. 

Immerman [Imjn82] and Vardi [Var82] obtained the following central result 
that captures the class P on ordered structures. 
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Theorem 4.14 (Immerman-Vardi). Over finite, ordered structures, the queries ex- 
pressible in the logic FO(LFP) are precisely those that can he computed in polynomial 
time. Namely, 

FO(LFP) = P. 

A characterization of PSPACE in terms of PFP was obtained in [AV91, 
Var82]. 

Theorem 4.15 (Abiteboul-Vianu, Vardi). Over finite, ordered structures, the queries 
expressible in the logic FO(PFP) are precisely those that can be computed in polynomial 
space. Namely, 

FO(PFP) = PSPACE. 

Note: We will often use the term LFP generically instead of FO(LFP) when we 
wish to emphasize the fixed point construction being performed, rather than the 
language. 
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5. The Link Between Polynomial 
Time Computation and Conditional 
Independence 

In Chapter 2 we saw how certain joint distributions that encode interactions 
between collections of variables "factor through" smaller, simpler interactions. 
This necessarily affects the type of influence a variable may exert on other vari- 
ables in the system. Thus, while a variable in such a system can exert its influ- 
ence throughout the system, this influence must necessarily be bottlenecked by 
the simpler interactions that it must factor through. In other words, the influ- 
ence must propagate with bottlenecks at each stage. In the case where there are 
conditional independencies, the influence can only be "transmitted through" 
the values of the intermediate conditioning variables. 

In this chapter, we will uncover a similar phenomenon underlying the log- 
ical description of polynomial time computation on ordered structures. The 
fundamental observation is the following: 

Least fixed point computations "factor through" first order computations, 
and so limitations of first order logic must he the source of the bottleneck at 
each stage to the propagation of information in such computations. 

The treatment of LFP versus FO in finite model theory centers around the fact 
that FO can only express local properties, while LFP allows non-local properties 
such as transitive closure to be expressed. We are taking as given the non-local 
capability o/LFP, and asking how this non-local nature factors at each step, and what is 
the effect of such a factorization on the pint distribution o/LFP acting upon ensembles. 
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Fixed point logics allow variables to be non-local in their influence, but this 
non-local influence must factor through first order logic at each stage. This is 
a very similar underlying idea to the statistical mechanical picture of random 
fields over spaces of configurations that we saw in Chapter 2, but comes cloaked 
in a very different garb — that of logic and operators. The sequence (F^) of op- 
erators that construct fixed points may be seen as the propagation of influence 
in a structure by means of setting values of "intermediate variables". In this 
case, the variables are set by inducting them into a relation at various stages 
of the induction. We want to understand the stage-wise bottleneck that a fixed 
point computation faces at each step of its execution, and tie this back to no- 
tions of conditional independence and factorization of distributions. In order 
to accomplish this, we must understand the limitations of each stage of a LFP 
computation and understand how this affects the propagation of long-range in- 
fluence in relations computed by LFP. Namely, we will bring to bear ideas from 
statistical mechanics and message passing to the logical description of compu- 
tations. 

It will be beneficial to state this intuition with the example of transitive clo- 
sure. 

Example 5.1. The transitive closure of an edge in a graph is the standard exam- 
ple of a non-local property that cannot be expressed by first order logic. It can 
be expressed in FO(LFP) as follows. Let £■ be a binary relation that expresses 
the presence of an edge between its arguments. Then we can see that iterating 
the positive first order formula ip{R,x,y) given by 

ip{R, X, y) = E{x, y) V 3z{E{x, z) A R{z, y)). 

builds the transitive closure relation in stages. 

Notice that the decision of whether a vertex enters the relation is based on 
the immediate neighborhood of the vertex. In other words, the relation is built 
stage by stage, and at each stage, vertices that have entered a relation make 
other vertices that are adjacent to them eligible to enter the relation at the next 
stage. Thus, though the resulting property is non-local, the information flow used to 
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compute it is stage-wise local. The computation factors through a local property 
at each stage, but by chaining many such local factors together, we obtain the 
non-local relation of transitive closure. This picture relates to a Markov random 
field, where such local interactions are chained together in a way that variables 
can exert their influence to arbitrary lengths, but the factorization of that influ- 
ence (encoded in the joint distribution) reveals the stage-wise local nature of the 
interaction. There are important differences however — the flow of LFP com- 
putation is directed, whereas a Markov random field is undirected, for instance. 
We have used this simple example just to provide some preliminary intuition. 
We will now proceed to build the requisite framework. 

5.1 The Limitations of LFP 

Many of the techniques in model theory break down when restricted to finite 
models. A notable exception is the Ehrenfeucht-Fraisse game for first order 
logic. This has led to much research attention to game theoretic characteriza- 
tions of various logics. The primary technique for demonstrating the limitations 
of fixed point logics in expressing properties is to consider them a segment of 
the logic , which extends first order logic with infinitary connectives, and 
then use the characterization of expressibility in this logic in terms of /c-pebble 
games. This is however not useful for our purpose (namely, separating P from 
NP) since NP C PSPACE and the latter class is captured by PFP, which is 
also a segment of . 

One of the central contributions of our work is demonstrating a completely 
different viewpoint of LFP computations in terms of the concepts of conditional 
independence and factoring of distributions, both of which are fundamental to 
statistics and probability theory. In order to arrive at this correspondence, we 
will need to understand the limitations of first order logic. Least fixed point 
is an iteration of first order formulas. The limitations of first order formulae 
mentioned in the previous section therefore appear at each step of a least fixed 
point computation. 
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Viewing LFP as "stage-wise first order" is central to our analysis. Let us 
pause for a while and see how this fits into our global framework. We are in- 
terested in factoring complex interactions between variables into their smallest 
constituent irreducible factors. Viewed this way, LFP has a natural factorization 
into its stages, which are all described by first order formulae. 

Let us now analyze the limitations of the LFP computation through this 
viewpoint. 

5.1.1 Locality of First Order Logic 

The local properties of first order logic have received considerable research at- 
tention and expositions can be found in standard references such as [Lib04, Ch. 
4], [EF06, Ch. 2], [Imm99, Ch. 6]. The basic idea is that first order formulae can 
only "see" up to a certain distance away from their free variables. This distance 
is determined by the quantifier rank of the formula. 

The idea that first order formulae are local has been formalized in essen- 
tially two different ways. This has led to two major notions of locality — Hanf 
locality [Han65] and Gaifman locality [Gai82]. Informally, Hanf locality says 
that whether or not a first order formula </? holds in a structure depends only on 
its multiset of isomorphism types of spheres of radius r. Gaifman locality says 
that whether or not </? holds in a structure depends on the number of elements 
of that structure having pairwise disjoint r-neighborhoods that fulfill first order 
formulae of quantifier depth d for some fixed d (which depends on (/?). Clearly, 
both notions express properties of combinations of neighborhoods of fixed size. 

In the literature of finite model theory, these properties were developed to 
deal with cases where the neighborhoods of the elements in the structure had 
bounded diameters. In particular, some of the most striking applications of 
such properties are in graphs with bounded degree, such as the linear time al- 
gorithm to evaluate first order properties on bounded degree graphs [See96]. 
In contrast, we will use some of the normal forms developed in the context of 
locality properties in finite model theory, but in the scenario where neighbor- 
hoods of elements have unbounded diameter. Thus, it is not only the locality 
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that is of interest to us, but the exact specification of the finitary nature of the 
first order computation. We will see that what we need is that first order logic 
can only exploit a bounded number of local properties. We will need both these 
properties in our analysis. 

Recall the notation and definitions from the previous chapter. We need some 
definitions in order to state the results. 

Definition 5.2. The Gaifman graph of a cr-structure 21 is denoted by Ggi and de- 
fined as follows. The set of nodes of Ggi is A. There is an edge between two 
nodes oi and 02 in Ggi if there is a relation Rina and a tuple t e such that 
both oi and 02 appear in t. 

With the graph defined, we have a notion of distance between elements Oj, aj 
of A, denoted by d{ai, aj), as simply the length of the shortest path between Oj 
and Qj in Ggi- We extend this to a notion of distance between tuples from A as 
follows. Let a = (ai, . . . , an) and b = (61, ... , bm)- Then 



There is no restriction on n and m above. In particular, the definition above 
also applies to the case where either of them is equal to one. Namely, we have 
the notion of distance between a tuple and a singleton element. We are now 
ready to define neighborhoods of tuples. Recall that (t„ is the expansion of a by 
n additional constants. 

Definition 5.3. Let 21 be a cr-structure and let a be a tuple over A. The ball of 
radius r around a is a set defined by 



The r-neighborhood of a in 21 is the (Tn -structure N^{a-) whose universe is Bf{a); 
each relation R is interpreted as R^ restricted to Bf{a.); and the n additional 
constants are interpreted as ai, . . . , a„. 

We recall the notion of a type. Informally, if L is a logic (or language), the L- 
type of a tuple is the sum total of the information that can be expressed about it 



d2i(a, b) 



min{(i2i(aj, bj): 1 < i < n,l < j < m}. 




(a) = {6 G A: d^{a,b) < r}. 
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in the language L. Thus, the first order type of a m-tuple in a structure is defined 
as the set of all FO formulae having m free variables that are satisfied by the 
tuple. Over finite structures, this notion is far too powerful since it characterizes 
the structure (21, a) up to isomorphism. A more useful notion is the local type of a 
tuple. In particular, a neighborhood is a cr^ -structure, and a type of a neighborhood 
is an equivalence class of such structures up to isomorphism. Note that any 
isomorphism between N^{ai, . . . , a^) and N^{bi, ... ,bn) must send to bi for 
l<i<n. 

Definition 5.4. Notation as above. The local r-type of a tuple a in 21 is the type of 
a in the substructure induced by the r-neighborhood of a in 21, namely in A'^(a). 

In what follows, we may drop the superscript if the underlying structure is 
clear. The following three notions of locality are used in stating the results. 

Definition 5.5. 1. Formulas whose truth at a tuple a depends only on 5^ (a) 
are called r-local. In other words, quantification in such formulas is re- 
stricted to the structure Nr{x.). 

2. Formulas that are r-local around their variables for some value of r are 
said to be local. 

3. Boolean combinations of formulas that are local around the various coor- 
dinates Xi of X are said to be basic local. 

As mentioned earlier, there are two broad flavors of locality results in lit- 
erature - those that follow from Hanf 's theorem, and those that follow from 
Gaifman's theorem. The first relates two different structures. [Han65] proved 
his result for infinite structures. We provide below the locality result due to 
[FSV95] that is suitable for finite models. To proceed, we need a definition. 

Definition 5.6. Let 21, ^ be a-structures and let m e N. If for every isomorphism 
type r of a r-neighborhood of a point, either 

1. Both 21 and *B have the same number of elements of type r. 
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2. Both 21 and 03 have more than m elements of type r. 

Then we say that 21 and *B are threshold (r, m) -equivalent. 

Theorem 5.7 ([FSV95]). For each k,l > 0, there exist r,m > 0 such that if 01 and 25 
are threshold (r, m)-equivalent and every element has degree at most I, then they satisfy 
the same first order formulae up to quantifier rank k, written 21 =k 03. Furthermore, r 
depends only on k. 

We refer the reader to [FSV95] for a discussion comparing the Fagin-Stockmeyer- 
Vardi theorem with Hanf 's theorem in the context of applications to finite model 
theory. In particular, neither theorem seems to imply the other. 

The Hanf locality lemma for formulae having a single free variable has a 
simple form and is an easy consequence of Thm. 5.7. 

Lemma 5.8. Notation as above. Let (p{x) be a formula of quantifier depth q. Then there 
is a radius r and threshold t such that if^ and *B have the same multiset of local types 
up to threshold t, and the elements a e 21 and 6 e *B have the same local type up to 
radius r, then 

21 ^ 99(a) « h 

See [Lin05] for an application to computing simple monadic fixed points on 
structures of bounded degree in linear time. 

Next we come to Gaifman's version of locality. 

Theorem 5.9 ([Gai82]). Every ?0 formula (^(x) over a relational vocabulary is equiv- 
alent to a Boolean combination of 

1. local formula around x, and 

2. sentences of the form 

\i=l l<i<j<s / 

where the (p are r-local. 
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In words, for every first order formula, there is an r such that the truth of the 
formula on a structure depends only on the number of elements having disjoint 
r-neighborhoods that satisfy certain local formulas. This again expresses the 
bounded number of local properties feature that limits first order logic. 

The following normal form for first order logic that was developed in an 
attempt to merge some of the ideas from Hanf and Gaifman locality. 

Theorem 5.10 ([SB99]). Every first-order sentence is logically equivalent to one of the 
form 

3xi--- 3xi\/yif{p^,y), 

where ^ is local around y. 

5.2 Simple Monadic LFP and Conditional Indepen- 
dence 

In this section, we exploit the limitations described in the previous section to 
build conceptual bridges from least fixed point logic to the Markov-Gibbs pic- 
ture of the preceding section. At first, this may seem to be an unlikely union. But 
we will establish that there are fundamental conceptual relationships between 
the directed Markovian picture and least fixed point computations. The key is 
to see the constructions underlying least fixed point computations through the 
lens of influence propagation and conditional independence. In this section, 
we will demonstrate this relationship for the case of simple monadic least fixed 
points. Namely, a FO(LFP) formula without any nesting or simultaneous induc- 
tion, and where the LFP relation being constructed is monadic. In later sections, 
we show how to deal with complex fixed points as well. 

We wish to build a view of fixed point computation as an information propa- 
gation algorithm. In order to do so, let us examine the geometry of information 
flow during an LFP computation. At stage zero of the fixed point computation, 
none of the elements of the structure are in the relation being computed. At the 
first stage, some subset of elements enters the relation. This changes the local 
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neighborhoods of these elements, and the vertices that lie in these local neigh- 
borhoods change their local type. Due to the global changes in the multiset of 
local types, more elements in the structure become eligible for inclusion into the 
relation at the next stage. This process continues, and the changes "propagate" 
through the structure. Thus, the fundamental vehicle of this information propagation 
is that a fixed point computation ip{R,x) changes local neighborhoods of elements at 
each stage of the computation. 
This propagation is 

1. directed, and 

2. relies on a bounded number of local neighborhoods at each stage. 

In other words, we observe that 

The influence of an element during LFP computation propagates in a simi- 
lar manner to the influence of a random variable in a directed Markov field. 

This correspondence is important to us. Let us try to uncover the under- 
lying principles that cause it. The directed property comes from the positivity 
of the first order formula that is being iterated. This ensures that once an ele- 
ment is inserted into the relation that is being computed, it is never removed. 
Thus, influence flows in the direction of the stages of the LFP computation. Fur- 
thermore, this influence flow is local in the following sense: the influence of an 
element can propagate throughout the structure, but only through its influence 
on various local neighborhoods. 

This correspondence is most striking in the case of bounded degree struc- 
tures. In that case, we have only 0(1) local types. 

Lemma 5.11. On a graph of bounded degree, there is a fixed number of non-isomorphic 
neighborhoods with radius r. Consequently, there are only a fixed number of local r- 
types. 

In order to determine whether an element in a structure satisfies a first order 
formula we need (a) the multiset of local r-types in the structure (also known 
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as its global type) for some value of r, and (b) the local type of the element. 
Furthermore, by threshold Hanf, we only need to know the multiset of local 
types up to a certain threshold. 

For large enough structures, we will cross the Hanf threshold for the multi- 
set of r-types. At this point, we will be making a decision of whether an element 
enters the relation based solely on its local r-type. This type potentially changes 
with each stage of the LFP. At the time when this change renders the element 
eligible for entering the relation, it will do so. Once it enters the relation, it 
changes the local r-type of all those elements which lie within a r-neighborhood 
of it, and such changes render them eligible, and so on. This is how the compu- 
tation proceeds, in a purely stage-wise local manner. This is a Markov property: 
the influence of an element upon another must factor entirely through the local 
neighborhood of the latter. 

In the more general case where degrees are not bounded, we still have fac- 
toring through local neighborhoods, except that we have to consider all the lo- 
cal neighborhoods in the structure. However, here the bounded nature of FO 
comes in. The FO formula that is being iterated can only express a property 
about some bounded number of such local neighborhoods. For example, in 
the Gaifman form, there are s distinguished disjoint neighborhoods that must 
satisfy some local condition. 

Remark 5.12. The same concept can be expressed in the language of sufficient 
statistics. Namely, knowing some information about certain local neighbor- 
hoods renders the rest of the information about variable values that have en- 
tered the relation in previous stages of the graph superfluous. In particular, 
Gaifman's theorem says that for first order properties, there exists a sufficient 
statistic that is gathered locally at a bounded number of elements. Knowing this statis- 
tic gives us conditional independence from the values of other elements that 
have already entered the relation previously, but not from elements that will 
enter the relation subsequently. This is similar to the directed Markov picture 
where there is conditional independence of any variable from non-descendants 
given the value of the parents. 
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Conditional Independence and factorization over 
a larger directed model called the ENSP 
(developed in Chapter 7) 



Figure 5.1: Range limited LFP computation process viewed as conditional inde- 
pendencies. 
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At this point, we have exhibited a correspondence between two apparently 
very different formalisms. This correspondence is illustrated in Fig. 5.1. 
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5.3 Conditional Independence in Complex Fixed Points 

In the previous sections, we showed that the natural "factorization" of LFP into 
first order logic, coupled with the bounded local property of first order logic can 
be used to exhibit conditional independencies in the relation being computed. 

But the argument we provided was for simple fixed points having one free 
variable, namely, for monadic least fixed points. How can we show that this 
picture is the same for complex fixed points? We accomplish this in stages. 

1. First, we use the transitivity theorem for fixed point logic to move nested 
fixed points into simultaneous fixed points without nesting. 

2. Next, we use the simultaneous induction lemma for fixed point logic to 
encode the relation to be computed as a "section" of a single LFP relation 
of higher arity. 

Steps 1 and 2 involve standard constructions in finite model theory, which we 
recall in Appendix A. 

At this point, we are now working with fc-tuples, for a k fixed for all problem 
sizes, instead of single elements. This will change the distance properties of the 
resulting structure of A:-tuples. Let us examine the case of a 2-ary relation that 
is being computed. In this case, we have the following situation. Every pair 
of elements occurs in the set of 2-tuples. This means that the neighborhood of 
every pair is 0{n), since for any element a of the structure, every other element 
b,c,d,--- occurs in a pair along with a. 

This means that when there is a change to a 2-tuple containing a, that change 
affects the neighborhoods of 0{n) other 2-tuples. At this point, we see that we 
are in the situation of 0{n) range interactions. The key point to note is that we 
still have only poly(log n) parametrization. This is because even though the inter- 
actions are of 0{n) range, the computation terminates in poly(n) steps, giving 
us a economical parametrization of the state space. Put another way, though the 
interactions are indeed between 0{n) elements at a time, they are severely value 
limited, leading once again to poly(log n) parametrization. Recall the discussion 
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of the two kinds of poly(logn) parameterizations (range limited and value lim- 
ited) from Chapter 3. We will actually build a graphical model to give us the 
parameterization in Chapter 8. 

We also need to ensure that our original structure has a relation that allows 
an order to be established on /c-tuples. In particular, this does not pose a prob- 
lem for encoding instances of k-SAT. The basic nature of information gathering 
and processing in LFP does not change when the arity of the computation rises. 
It merely adds the ability to gather polynomially more information at each stage 
taken from 0{n) variates at a time. But since the LFP terminates in polynomi- 
ally many steps, the number of joint values taken by the system of n variables 
is only 2P°^y('°sn) Although each element sees 0{n) variates at each stage of the 
LFP, it has the capability to utilize only poly (log n) amount of that information 
in the following precise sense. A "true" joint distribution over n takes c", c > 1 
different values. It requires, therefore, 0(c") independent parameters to spec- 
ify. This happens because the behavior of one variable is dependent on all n — 1 
others simultaneously. In cases of joint distributions of n covariates which take 
only 2P°^y('°s") values, this can not be the case since the resulting distribution 
can be parameterized far too economically. 

Remark 5.13. We could work over a product structure where LFP captures the 
class of polynomial time computable queries. In other words, we have to work 
in a structure whose elements are /c-tuples of our original structure. In this way, 
a k-ary LFP over the original structure would be a monadic LFP over this struc- 
ture. The 0{n) nature of interactions remains, but again the parametrization is 
only poly (log n). 

Note that there are elegant ways to work with the space of equivalence 
classes of /c-tuples with equivalence under first order logic with /c-variables. 
For instance, one can consider a construction known as the canonical structure 
due originally to [DLW95] who used it to provide a model theoretic proof of 
the important theorem in [AV95] that P = PSPACE if and only if LFP = PFP. 
Note that this is for all structures, not just for ordered structures. 

The issue one faces is that there is a linear order on the canonical structure. 
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which renders the Gaifman graph trivial (totally connected). See [Lib04, §11.5] 
for more details on canonical structures. The simple scheme described above 
suffices for our purposes. 

Remark 5.14. Though the Immerman-Vardi theorem is usually stated for ordered 
structures, it holds for structures equipped with a successor relation (and no lin- 
ear ordering). See [LR03, §11.2, p. 204] where the result is stated for successor 
structures. The benefit of equipping our structures only with a successor struc- 
ture is that the Gaifman graph remains non-trivial. 

5.4 Aggregate Properties of LFP over Ensembles 

We have shown that any polynomial time computation will update its relation 
according to a certain Markov type property on the space of A;-types of the un- 
derlying structure, after extracting a statistic from the local neighborhoods of 
the underlying structure. Thus far, there is no probabilistic picture, or a distri- 
bution that we can analyze. We are only describing a fully deterministic com- 
putation. 

The distribution we seek will arise when we examine the aggregate behav- 
ior of LFP over ensembles of structures that come from ensembles of constraint 
satisfaction problems (CSPs) such as random /c-SAT. When we examine the 
properties in the aggregate of LFP running over ensembles, we will find the 
following. 

The "bounded number of local" property of each stage of monadic LFP 
computation manifests as conditional independencies in the distribution, 
making the distribution of solutions po\j {log n)-parametrizable. Likewise, 
value limited interactions in higher arity LFP computations also lead to 
distribution of solutions that are poly(log n)-parametrizable. 

This gives us the setting where we can exploit the full machinery of graphical 
models of Chapter 2. 
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Before we examine the distributions arising from LFP acting on ensembles 
of structures, we will bring in ideas from statistical physics into the proof. We 
begin this in the next chapter. 
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6.1 Ensembles and Phase Transitions 

The study of random ensembles of various constraint satisfaction problems 
(CSPs) is over two decades old, dating back at least to [CF86]. While a given 
CSP — say, 3-SAT — might be NP-complete, many instances of the CSP might 
be quite easy to solve, even using fairly simple algorithms. Furthermore, such 
"easy" instances lay in certain well defined regimes of the CSP, while "harder" 
instances lay in clearly separated regimes. Thus, researchers were motivated to 
study randomly generated ensembles of CSPs having certain parameters that 
would specify which regime the instances of the ensemble belonged to. We will 
see this behavior in some detail for the specific case of the ensemble known as 
random /c-SAT. 

An instance of k-SAT is a propositional formula in conjunctive normal form 

$ = Ci A C2 A • • • A 

having m clauses Cj, each of whom is a disjunction of k literals taken from n 
variables {xi, . . . , Xn}. The decision problem of whether a satisfying assignment 
to the variables exists is NP-complete for A; > 3. The ensemble known as ran- 
dom /c-SAT consists of instances of /c-SAT generated randomly as follows. An 
instance is generated by drawing each of the m clauses {Ci, . . . , C^} uniformly 
from the 2^^ (^) possible clauses having k variables. The entire ensemble of ran- 
dom /c-SAT having m clauses over n literals will be denoted by SAT k{n,m), 
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and a single instance of this ensemble will be denoted by ^k{n, m). The clause 
density, denoted by a and defined as a := m/n is the single most important 
parameter that controls the geometry of the solution space of random /c-SAT. 
Thus, we will mostly be interested in the case where every formula in the en- 
semble has clause density a. We will denote this ensemble by SATfe(n, a), and 
an individual formula in it by ^k{n^ a). 

Random CSPs such as fc-SAT have attracted the attention of physicists be- 
cause they model disordered systems such as spin glasses where the Ising spin of 
each particle is a binary variable ("up" or "down") and must satisfy some con- 
straints that are expressed in terms of the spins of other particles. The energy of 
such a system can then be measured by the number of unsatisfied clauses of a 
certain /c-SAT instance, where the clauses of the formula model the constraints 
upon the spins. The case of zero energy then corresponds to a solution to the 
/c-SAT instance. The following formulation is due to [MZ97]. First we trans- 
late the Boolean variables Xi to Ising variables Si in the standard way, namely 
Si — — (— Then we introduce new variables Cu as follows. The variable Cu 
is equal to 1 if the clause Ci contains Xi, it is —1 if the clause contains -iXj, and is 
zero if neither appears in the clause. In this way, the sum CiiSi measures 
the satisfiability of clause Ci. Specifically, if J2i=i CuSi — k > 0, the clause is 
satisfied by the Ising variables. The energy of the system is then measured by 
the Hamiltonian 

m n 

H = J2S{J2CiiSi,-K). 

i=l i=l 

Here S{i,j) is the Kronecker delta. Thus, satisfaction of the /c-SAT instance 
translates to vanishing of this Hamiltonian. Statistical mechanics then offers 
techniques such as replica symmetry, to analyze the macroscopic properties of 
this ensemble. 

Also very interesting from the physicist's point of view is the presence of a 
sharp phase transition [CKT91, MSL92] (see also [KS94]) between satisfiable and 
unsatisfiable regimes of random A;-S AT. Namely, empirical evidence suggested 
that the properties of this ensemble undergoes a clearly defined transition when 
the clause density is varied. This transition is conjectured to be as follows. For 
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each value of k, there exists a transition threshold adk) such that with proba- 
bility approaching 1 as n ^ oo (called the Thermodynamic limit by physicists), 

• if q; < ac{k), an instance of random A;-SAT is satisfiable. Hence this region 
is called the SAT phase. 

• If a > ac{k), an instance of random /c-SAT is unsatisfiable. This region is 
known as the unSAT phase. 

There has been intense research attention on determining the numerical value 
of the threshold between the SAT and unSAT phases as a function of k. [Fri99] 
provides a sharp but non-uniform construction (namely, the value etc is a func- 
tion of the problem size, and is conjectured to converge as n — > oo). Functional 
upper bounds have been obtained using the first moment method [MA02] and 
improved using the second moment method [AP04] that improves as k gets 
larger. 



6.2 The dlRSB Phase 

More recently, another thread on this crossroad has originated once again from 
statistical physics and is most germane to our perspective. This is the work in 
the progression [MZ97], [BMWOO], [MZ02], and [MPZ02] that studies the evo- 
lution of the solution space of random /c-SAT as the constraint density increases 
towards the transition threshold. In these papers, physicists have conjectured 
that there is a second threshold that divides the SAT phase into two — an "easy" 
SAT phase, and a "hard" SAT phase. In both phases, there is a solution with 
high probability, but while in the easy phase one giant connected cluster of 
solutions contains almost all the solutions, in the hard phase this giant clus- 
ter shatters into exponentially many communities that are far apart from each 
other in terms of least Hamming distance between solutions that lie in distinct 
communities. Furthermore, these communities shrink and recede maximally 
far apart as the constraint density is increased towards the SAT-unSAT thresh- 
old. As this threshold is crossed, they vanish altogether. 
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As the clause density is increased, a picture known as the "IRSB hypothesis" 
emerges that is illustrated in Fig. 6.1, and described below. 

RS For a < ad, a problem has many solutions, but they all form one giant 
cluster within which going from one solution to another involves flipping 
only a finite (bounded) set of variables. This is the replica symmetric phase. 

dlRSB At some value of a = which is below it has been observed that 
the space of solutions splits up into "communities" of solutions such that 
solutions within a community are close to one another, but are far away 
from the solutions in any other community. This effect is known as shat- 
tering [ACO08]. Within a community, flipping a bounded finite number 
of variable assignments on one satisfying takes one to another satisfying 
assignment. But to go from one satisfying assignment in one community 
to a satisfying assignment in another, one has to flip a fraction of the set 
of variables and therefore encounters what physicists would consider an 
"energy barrier" between states. This is the dynamical one step replica sym- 
metry breaking phase. 

unSAT Above the SAT-unSAT threshold, the formulas of random /c-SAT are 
unsatisfiable with high probability. 

Using statistical physics methods, [KMRT+07] obtained another phase that 
lies between dlRSB and unSAT. In this phase, known as IRSB {one step replica 
symmetry breaking), there is a "condensation" of the solution space into a sub- 
exponential number of clusters, and the sizes of these clusters go to zero as the 
transition occurs, after which there are no more solutions. This phase has not 
been proven rigorously thus far to our knowledge and we will not revisit it in 
this work. 

The IRSB hypothesis has been proven rigorously for high values of k. Specif- 
ically, the existence of the dlRSB phase has been proven rigorously for the case 
of A; > 8, starting with [MMZ05] (see also [DMMZ08]) who showed the exis- 
tence of clusters in a certain region of the SAT phase using first moment meth- 
ods. Later, [ART06] rigorously proved that there exist exponentially many clus- 
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ters in the dlRSB phase and showed that within any cluster, the fraction of 
variables that take the same value in the entire cluster (the so-called /rozen vari- 
ables) goes to one as the SAT-unSAT threshold is approached. Further [ACO08] 
obtained analytical expressions for the threshold at which the solution space of 
random fc-SAT (as also two other CSPs — random graph coloring and random 
hypergraph 2-colorability) shatters, as well as confirmed the 0{n) Hamming 
separation between clusters. 




1 1 > 

a ad Oic 

Figure 6.1: The clustering of solutions just before the SAT-unSAT threshold. 
Below ad, the space of solution is largely connected. Between ad and ac, the 
solutions break up into exponentially many communities. Above ac, there are 
no more solutions, which is indicated by the unfilled circle. 

In summary, in the region of constraint density a G [ad, ac], the solution 
space is comprised of exponentially many communities of solutions which re- 
quire a fraction of the variable assignments to be flipped in order to move be- 
tween each other. 

6.2.1 Cores and Frozen Variables 

In this section, we reproduce results about the distribution of variable assign- 
ments within each cluster of the dlRSB phase from [MMW07], [ART06], and 
[ACO08]. 

We first need the notion of the core of a cluster. Given any solution in a clus- 
ter, one may obtain the core of the cluster by "peeling away" variable assign- 
ments that, loosely speaking occur only in clauses that are satisfied by other 
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variable assignments. This process leads to the core of the cluster. 

To get a formal definition, first we define a partial assignment of a set of vari- 
ables {xi, . . . , Xn) as an assignment of each variable to a value in {0, 1, *}. The 
* assignment is akin to a "joker state" which can take whichever value is most 
useful in order to satisfy the /c-SAT formula. 

Next, we say that a variable in a partial assignment is free when each clause 
it occurs in has at least one other variable that satisfies the clause, or has as 
assignment to *. 

Finally, to obtain the core of a cluster, we repeat the following starting with 
any solution in the cluster: if a variable is free, assign it a *. 

This process will eventually lead to a fixed point, and that is the core of the 
cluster. We may easily see that the core is not dependent upon the choice of the 
initial solution. 

What does the core of a cluster look like? Recall that the core is itself a 
partial assignment, with each variable being assigned a 0, 1 or a *. Of obvious 
interest are those variables that are assigned 0 or 1. These variables are said to be 
frozen. Note that since the core can be arrived at starting from any choice of an 
initial solution in the cluster, it follows that frozen variables take the same value 
throughout the cluster. For example, if the variable Xi takes value 1 in the core 
of a cluster, then every solution lying in the cluster has Xi assigned the value 
1. The non-frozen variables are those that are assigned the value * in the core. 
These take both values 0 and 1 in the cluster. Clearly the number of * variables 
is a measure of the internal entropy (and therefore the size) of a cluster since it 
is only these variables whose values vary within the cluster. 

Apriori, we have no way to tell that the core will not be the all * partial 
assignment. Namely, we do not know whether there are any frozen variables at 
all. However, [ART06] proved that for high enough values of k, with probability 
going to 1 in the thermodynamic limit, almost every variable in a core is frozen 
as we increase the clause density towards the SAT-unSAT threshold. 

Theorem 6.1 ([ART06]). For every r e [0, |] there is a constant kr such that for all 
k > kr, there exists a clause density a{k, r) < ac such that for all a e [a{k, r),ac\, 
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asymptotically almost surely 

1. every cluster of solutions of^kin-, an) has at least (1 — r)n frozen variables, 

2. fewer than rn variables take the value *. 

This gives us the corollary. 

Corollary 6.2 ([ART06]). For every k >9, there exist a < a^k) such that with high 
probability, every cluster of the solution space o/$jk(n, an) has frozen variables. 

Note that this picture is known to hold only for A; > 9 and is an open question 
for A; < 9. See also the remark at the end of this section. 

We end this section with a physical picture of what forms a core. If a formula 
$ has a core with C clauses, then these clauses must have literals that come 
from a set of at most C variables. By bounding the probability of this event, 
[MMW07] obtained a lower bound on the size of cores. The bound is linear, 
which means that when non-trivial cores do exist ( [ART06] proved their exis- 
tence for k > 9), they must involve a fraction of all the variables in the formula. 
In other words, a core may be thought of as the onset of a large single interaction 
of degree 0{n) among the variables. Furthermore, this core is instantiated am- 
ply in the solution space (by that we mean it takes exponentially many values 
in. those many clusters of the dlRSB phase). As the reader may imagine after 
reading the previous chapters, this sort of interaction cannot be dealt with by 
LFP algorithms. We will need more work to make this precise, but informally 
cores are too large to pass through the bottlenecks that the stage-wise first order 
LFP algorithms create. 

This may also be interpreted as follows. Algorithmis based on LFP can tackle 
long range interactions between variables, but only when they can be factored 
into interactions of degree poly (log n) or are value limiited. But the appearance 
of cores is equivalent to the onset of 0{n) degree interactions which cannot be 
further factored into poly(logn) degree interactions, and are ample. Such am- 
ple irreducible 0{n) interactions, caused by increasing the clause density suffi- 
ciently, cannot be dealt with using an LFP algorithm. 
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We have already noted that this is because LFP algorithms factor through 
first order computations, and in a first order computation, the decision of whether 
an element is to enter the relation being computed is based on information col- 
lected from local neighborhoods and combined in a bounded fashion. This bot- 
tleneck is too small for a core to factor through in range limited LFP. The am- 
pleness precludes value limited interactions also as we shall see. The precise 
statement of this intuitive picture will be provided in the next chapter when we 
build our conditional independence hierarchies. 

The freezing of variables in cores is known to happen only for k > 9 
[ART06]. It remains open for k < 8. Indeed, for low values of k such 
as k = 2,3, there is empirical evidence that this phenomenon does not 
take place [MMW05], also see the discussion in [ART06, §2 J. Hence, our 
separation of complexity classes needs the regime ofk>9. 

6.2.2 Performance of Known Algorithms in the dlRSB Phase 

We end this chapter with a brief overview of the performance of known algo- 
rithms as a function of the clause density, and pointers to more detailed surveys. 

Beginning with [CKT91] and [MSL92], there has been an understanding that 
hard instances of random /c-SAT tend to occur when the constraint density a 
is near the transition threshold, and that this behavior was similar to phase tran- 
sitions in spin glasses [KS94]. Now that we have surveyed the known results 
about the geometry of the space of solutions in this region, we turn to the ques- 
tion of how the two are related. 

It has been empirically observed that the onset of the dlRSB transition seems 
to coincide with the constraint density where traditional solvers tend to exhibit 
exponential slowdown; see [ACO08] and [CO09]. See also [CO09] for the best 
current algorithm along with a comparison of various other algorithms to it. 
Thus, while both regimes in SAT have solutions with high probability, the ease 
of finding a solution differs quite dramatically on traditional SAT solvers due to 
a clustering of the solution space into numerous communities that are far apart 
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from each other in terms of Hamming distance. In particular, for clause den- 
sities above 0(2'' /k), no algorithms are known to produce solutions in polyno- 
mial time with probability — neither on the basis of rigorous or empirical 
analysis, or any other evidence [CO09]. Compare this to the SAT-unSAT thresh- 
old, which is asymptotically 2^" In 2. Thus, well below the SAT-unSAT threshold, 
in regimes where we know solutions exist, we are currently unable to find them 
in polynomial time. Our work will explain that indeed, this is fundamentally 
a limitation of polynomial time algorithms. Specifically, in such phases (for 
k > 9), the solution space geometry is not expressible as a mixture of range 
or value limited poly(log n)-parametrizable pieces. This is because in the dlRSB 
phase, the distribution of solutions is both irreducibly correlated at ranges 0{n), 
and ample, precluding both range and value limited parametrizations. 

Please see [CO09] for the best known algorithm that does solve SAT in- 
stances with non-vanishing probability for densities up to 2^uj{k)/k for any 
sequence uj{k) oo. See [ACO08] for proofs that the clause density where 
all known polynomial time algorithms fail on NP-complete problems such as 
/c-SAT and graph coloring coincides with the onset of the dlRSB phase in these 
problems. This clause density threshold for the onset of the dlRSB phase is 
2^/k In k [ACO08]. The earlier [ART06] had established the existence of shatter- 
ing and freezing of variables within cores for a = 6(2^). 

The significance of the value of k By the results of [ART06] and [ACO08, §2.1, 
Rem. 2], we are guaranteed the presence of the full dlRSB phenomena only for k > 9 
and clause density above {2^ /k) In k. 

Hence, for our separation of complexity classes, we will work with ran- 
dom /c-SAT in the A; > 9 regime, and the clause density sufficiently high so 
that we are in the dlRSB phase. We will require all known properties of the 
dlRSB phase — namely, the exponentially many clusters, the freezing of vari- 
ables within clusters, and the 0{n) variable changes required to move from one 
cluster to another. These properties are not known to hold except for /c > 9 and 
clause densities above (2*^/ k) In k. 
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It should be noted that there is empirical evidence that the dlRSB phase is 
not present in random 3-SAT in the following sense. The cores in the clusters 
of random 3-SAT are trivial. By that we mean that they tend to be the all * core, 
unlike k > 9 where [ART06] show the existence of nontrivial cores for almost 
all clusters after the dlRSB threshold. 

We should also point out that the experimental behavior of algorithms for 
A;-SAT is largely characterized for lower values oi k — 2, 3, 4, where the full 
dlRSB picture is not known to hold. For instance, the experimental behavior 
of algorithms reported in [MRTS07] is on random 4-SAT. See also [KMRT+06], 
where experiments are reported on 4-SAT. We are not aware of experimental 
work done that shows the efficacy (even under mild requirements) of any algo- 
rithm on A; > 9 after the onset of the dlRSB phase with nontrivial cores. 

Incomplete algorithms are a class that do not always find a solution when 
it exists, nor do they indicate the lack of solution except to the extent that 
they were unable to find one. Incomplete algorithms are obviously very im- 
portant for hard regimes of constraint satisfaction problems since we do not 
have complete algorithms in these regimes that have economical running times. 
More recently, a breakthrough for incomplete algorithms in this field came with 
[MPZ02] who used the cavity method from spin glass theory to construct an 
algorithm named survey propagation that does very well on instances of random 
fc-SAT with constraint density above the aforementioned clustering threshold, 
and continues to perform well very close to the threshold a^, for low values of 
k. Survey propagation seems to scale as n log n in this region. The algorithm 
uses the IRSB hypothesis about the clustering of the solution space into numer- 
ous communities. The original work reported in [MPZ02] was on 3-SAT. The 
behavior of survey propagation for higher values of k is still being researched. 
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We will use factor graphs as a convenient means to encode various properties 
of the random fc-SAT ensemble. In this section we introduce the factor graph 
ensembles that represent random k-SAT. Our treatment of this section follows 
[MM09, Chapter 9]. 

Definition 7,1, The random k-factor graph ensemble, denoted by Gfc(n, m), consists 
of graphs having n variable nodes and m function nodes constructed as follows. 
A graph in the ensemble is constructed by picking, for each of the m function 

nodes in the graph, a A;-tuple of variables uniformly from the (^') possibilities 
for such a /c-tuple chosen from n variables. 

Graphs constructed in this manner may have two function nodes connected 
to the same /c-tuple of variables. In this ensemble, function nodes all have de- 
gree k, while the degree of the variable nodes is a random variable with expec- 
tation km/n. 

Definition 7,2, The random {k, a) -factor graph ensemble, denoted by Gk{n, a), con- 
sists of graphs constructed as follows. For each of the (^') /c-tuples of variables, 
a function node that connects to only these k variables is added to the graph 
with probability an/ (^) . 

In this ensemble, the number of function nodes is a random variable with 
expectation an, and the degree of variable nodes is a random variable with 
expectation ak. 

We will be interested in the case of the thermodynamic limit of n, m — )■ oo 
with the ratio a := m/n being held constant. In this case, both the ensembles 
converge in the properties that are important to us, and both can be seen as the 
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underlying factor graph ensembles to our random A;-SAT ensemble SATfe(n, a) 
(see Chapter 6 for definitions and our notation for random A;-S AT ensembles). 

With the definitions in place, we are ready to describe two properties of 
random graph ensembles that are pertinent to our problem. 

7.1 Properties of Factor Graph Ensembles 

The first property provides us with intuition on why algorithms find it so hard 
to put together local information to form a global perspective in CSPs. 

7.1.1 Locally Tree-Like Property 

We have seen in Chapter 5 that the propagation of influence of variables during 
a LFP computation is stagewise-local. This is really the fundamental limitation 
of LFP that we seek to exploit. In order to understand why this is a limitation, 
we need to examine what local neighborhoods of the factor graphs underly- 
ing NP-complete problems like A;-SAT look like in hard phases such as dlRSB. 
In such phases, there are many extensive (meaning 0{n)) correlations between 
variables that arise due to loops of sizes 0(log n) and above. 

However, remarkably, such graphs are locally trivial. By that we mean that 
there are no cycles in a 0(1) sized neighborhood of any vertex as the size of the 
graph goes to infinity [MM09, §9.5]. One may demonstrate this for the Erdos- 
Renyi random graph as follows. Here, there are n vertices, and there is an edge 
between any two with probability c/n where c is a constant that parametrizes 
the density of the graph. Edges are "drawn" uniformly and independently of 
each other. Consider the probability of a certain graph {V, E) occurring as a 
subgraph of the Erdos-Renyi graph. Such a graph can occur in (|^|) positions. 
At each position, the probability of the graph structure occurring is 

pl^l(l-p)^-l^l. 

Applying Stirling's approximations, we see that such a graph occurs asymptot- 
ically 0{\V\ — \E\) times. If the graph is connected, |y| < |E| — 1 with equality 
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only for trees. Thus, in the limit of n — oo, finite connected graphs have van- 
ishing probability of occurring in finite neighborhoods of any element. 

In short, if only local neighborhoods are examined, the two ensembles Gfc(n, m) 
and Tfe(n, m) are indistinguishable from each other. 

Theorem 7.3. Let G be a randomly chosen graph in the ensemble Gk{n,m), and i 
be a uniformly chosen node in G. Then the r-neighborhood of i in G converges in 
distribution to Tfc(n, m) as n ^ oo. 

Let us see what this means in terms of the information such graphs divulge 
locally. The simplest local property is degrees of elements. These are, of course, 
available through local inspection. The next would be small connected sub- 
graphs (triangles, for instance). But even this next step is not available. In 
other words, such random graphs do not provide any of their global proper- 
ties through local inspection at each element. 

Let us think about what this implies. We know from the onset of cores and 
frozen variables in the IdRSB phase of /c-SAT that there are strong correlations 
between blocks of variables of size 0{n) in that phase. However, these loops 
are invisible when we inspect local neighborhoods of a fixed finite size, as the 
problem size grows. 



7.1.2 Degree Profiles in Random Graphs 

The degree of a variable node in the ensemble Gkin, m) is a random variable. 
We wish to understand the distribution of this random variable. The expected 
value of the fraction of variables in Gjk(n, m) having degree d is the same as the 
expected value that a single variable node has degree d, both being equal to 

P{degvi ^d)= (^'^^/(l - pT"^ where p = ^. 
In the large graph limit we get 

lim P{degVi = d) = e"*^"^^. 

n->oo n ! 
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In other words, the degree is asymptotically a Poisson random variable. 
A corollary is that the maximum degree of a variable node is almost surely 
less than 0(log n) in the large graph case. 

Lemma 7.4. The maximum variable node degree in m) is asymptotically almost 
surely 0(log n). In particular, it asymptotically almost surely satisfies the following 



where z — (log n) / kae. 

Proof. See [MM09, p. 184] for a discussion of this upper bound, as well as a 




(7.1) 



lower bound. 
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8. Separation of Complexity Classes 



We have built a framework that connects ideas from graphical models, logic, 
statistical mechanics, and random graphs. We are now ready to begin our final 
constructions that will yield the separation of complexity classes. 

We have described the fundamental similarity between range limited and 
value limited distributions in Chapter 3. Both are hampered by the same un- 
derlying property — that inspite of being distributions on n covariates, they 
can be specified with only 2P°'y(^°s") parameters. In our terminology, they are 
both poly (log n)-parametrizable. Informally, this means that their joint distribu- 
tion behaves like the joint distribution of only poly (log n) covariates instead of 
n covariates. 

In light of the above, we first consider the case of range limited poly (log n)- 
parametrizations. We return to value limited poly (log n)-parametrizations just 
before the final separation of complexity classes in Section 8.5. 

8.1 Measuring Conditional Independence in Range 
Limited Models 

Our central concern with respect to range limited models is to understand which 
variable interactions in a system are irreducible — namely, those that cannot be 
expressed in terms of interactions between smaller sets of variables with con- 
ditional independencies between them. Such irreducible interactions can be 
2-interactions (between pairs), 3-interactions (between triples), and so on, up to 
n-interactions between all n variables simultaneously. 

A joint distribution encodes the interaction of a system of n variables. What 
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would happen if all the direct interactions between variables in the system were 
all of less than a certain finite range k, with k < n? In such a case, the "joint- 
ness" of the covariates really would lie at a lower "level" than n. We would like 
to measure the "level" of conditional independence in a system of interacting 
variables by inspecting their joint distribution. At level zero of this "hierarchy", 
the covariates should be independent of each other. At level n, they are coupled 
together n at a time, without the possibility of being decoupled. In this way, 
we can make statements about how deeply entrenched the conditional inde- 
pendence between the covariates is, or dually, about how large the set of direct 
interactions between variables is. 

Remark 8.1. Similarly, if the variables did interact n at a time, but took only 
2Poiy(iogn) j-QjjT^t values, the "jointness" of the distribution would lie at a lower 
level than n. In both cases above, as stated in Chapter 3, the n covariates do not 
display the behavior of a typical joint distribution of n variables. Instead, they 

behave in ways similar to a set of poly (log n) covariates. 

When the largest irreducible interactions are /c-interactions, the distribution 
can be parametrized with n2'' independent parameters. Thus, in families of dis- 
tributions where the irreducible interactions are of fixed size, the independent 
parameter space grows polynomially with n, whereas in a general distribution 
without any conditional independencies, it grows exponentially with n. The 
case of monadic LFP lies in between — the interactions are not of fixed size, but 
they grow relatively slowly. The case of complex LFP is also one of poly (log n)- 
parametrization, except it is a value-limited 0{n) interaction model. 

There are some technical issues with constructing such a hierarchy to mea- 
sure conditional independence. The first issue would be how to measure the 
level of a distribution in this hierarchy. If, for instance, the distribution has a 
directed T'-map, then we could measure the size of the largest clique that ap- 
pears in its moralized graph. However, as noted in Sec. 2.5, not all distributions 
have such maps. We may, of course, upper and lower bound the level using 
minimal X-maps and maximal P-maps for the distribution. In the case of or- 
dered graphs, we should note that there may be different minimal X-maps for 
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the same distribution for different orderings of the variables. See [KF09, p. 80] 
for an example. 

The insight that allows us to resolve the issue is as follows. If we could 
somehow embed the distribution of solutions generated by LFP into a larger dis- 
tribution, such that 

1. the larger distribution factorized recursively according to some directed 
graphical model, and 

2. the larger distribution had only polynomially many more variates than 
the original one, 

then we would have obtained a parametrization of our distribution that would 
reflect the factorization of the larger distribution, and would cost us only poly- 
nomially more, which does not affect us. 

By pursuing the above course, we aim to demonstrate that distributions of 
solutions generated by LFP lie at a lower level of conditional independence than 
distributions that occur in the dlRSB phase of random /c-SAT. Consequently, 
they have more economical paramietrizations than the space of solutions in the 
dlRSB phase does. 

We will return to the task of constructing such an embedding in Sec. 8.3. 
First we describe how we use LFP to create a distribution of solutions. 

8.2 Generating Distributions from LFP 

We will describe the method of generating distributions and showing economic 
parametrizations by embedding the covariates into a larger directed graphical 
model below for monadic LFP. We will indicate the differences for complex 
LFP. 

8.2.1 Encoding k-SAJ into Structures 

In order to use the framework from Chapters 4 and 5, we will encode fc-SAT 
formulae as structures over a fixed vocabulary. 
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Our vocabularies are relational, and so we need only specify the set of rela- 
tions, and the set of constants. We will use three relations. 

1. The first relation Rc will encode the clauses that a SAT formula comprises. 
Since we are studying ensembles of random /c-SAT, this relation will have 
arity k. 

2. We need a relation in order to make FO(LFP) capture polynomial time 
queries on the class of /c-SAT structures. We will not introduce a linear 
ordering since that would make the Gaifman graph a clique. Rather we 
will include a relation such that FO(LFP) can capture all the polynomial 
time queries on the structure. This will be a binary relation Re- 

3. Lastly, we need a relation Rp to hold "partial assignments" to the SAT 
formulae. We will describe these in the Sec. 8.2.3. 

4. We do not require constants. 
This describes our vocabulary 

a = {Rc,RE,Rp}. 

Next, we come to the universe. A SAT formula is defined over n variables, 
but they can come either in positive or negative form. Thus, our universe will 
have 2n elements corresponding to the variables xi, . . . , -ixi, . . . , In or- 
der to avoid new notation, we will simply use the same notation to indicate the 
corresponding element in the universe. We denote by lower case xi the literals 
of the formula, while the corresponding upper case Xi denotes the correspond- 
ing variable in a model. 

Finally, we need to interpret our relations in our universe. We dispense with 
the superscripts since the underlying structure is clear. The relation Rc will 
consist of fc-tuples from the universe interpreted as clauses consisting of dis- 
junctions between the variables in the tuple. The relation Re will be interpreted 
as an "edge" between successive variables. The relation Rp will be a partial 
assignment of values to the underlying variables. 
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Now we encode our A;-SAT formulae into cr-structures in the natural way. 
For example, for k = 3, the clause xi V -1X2 V -1X3 in the SAT formula will be 
encoded by inserting the tuple (,xi, -1X2, -^xs) in the relation Re- Similarly, the 
pairs {xi, x-i+i) and {^Xi, ^Xj+i), both for 1 < i < n, as well as the pair -^xi) 
will be in the relation Re- This chains together the elements of the structure. 

The reason for the relation Re that creates the chain is that on such struc- 
tures, polynomial time queries are captured by FO(LFP) [EF06, §11.2]. This is 
a technicality. Recall that an order on the structure enables the LFP computa- 
tion (or the Turing machine the runs this computation) to represent tuples in 
a lexicographical ordering. In our problem of A;-SAT, it plays no further role. 
Specifically, the assignments to the variables that are computed by the LFP have 
nothing to do with their order. They depend only on the relation Rc which en- 
codes the clauses and the relation Rp that holds the initial partial assignment 
that we are going to ask the LFP to extend. In other words, each stage of the 
LFP is order-invariant. It is known that the class of order invariant queries is also 
Gaifman local [GSOO]. However to allow LFP to capture polynomial time on the 
class of encodings, we need to give the LFP something it can use to create an 
ordering. We could encode our structures with a linear order, but that would 
make the Gaifman graph fully connected. What we want is something weaker, 
that still suffices. Thus, we encode our structures as successor-type structures 
through the relation Re- This seems most natural, since it imparts on the struc- 
ture an ordering based on that of the variables. Note also that SAT problems 
may also be represented as matrices (rows for clauses, columns for variables 
that appear in them), which have a well defined notion of order on them. 

Ensembles of A;-SAT Let us now create ensembles of a-structures using the 
encoding described above. We will start with the ensemble SATfc(n, a) and 
encode each k-SAT instance as a cr-structure. The resulting ensemble will be 
denoted by &k{n, a). The encoding of the problem ^ki^, a) as a cr-structure will 
be denoted by ^ki^, a). 
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8.2.2 The LFP Neighborhood System 

In this section, we wish to describe the neighborhood system that underlies the 
monadic LFP computations on structures of 6jt(n, a). We begin with the factor 
graph, and build the neighborhood system through the Gaifman graph. 

Let us recall the factor graph ensemble Gk{n, m). Each graph in this ensem- 
ble encodes an instance of random /c-SAT. We encode the /c-SAT instance as 
a structure as described in the previous section. Next, we build the Gaifman 
graph of each such structure. The set of vertices of the Gaifman graph are sim- 
ply the set of variable nodes in the factor graph and their negations since we 
are using both variables and their negations for convenience (this is simply an 
implementation detail). For instance, the Gaifman graph for the factor graph of 
Fig 2.2 will have 12 vertices. Two vertices are joined by an edge in the Gaifman 
graph either when the two corresponding variable nodes were joined to a single 
function node (i.e., appeared in a single clause) of the factor graph or if they are 
adjacent to each other in the chain that relation Re has created on the structure. 

On this Gaifman graph, the simple monadic LFP computation induces a 
neighborhood system described as follows. The sites of the neighborhood sys- 
tem are the variable nodes. The neighborhood Ms of a site s is the set of all nodes 
that lie in the r-neighborhood of a site, where r is the locality rank of the first 
order formula whose fixed point is being constructed by the LFP computation. 

Finally we make the neighborhood system into a graph in the standard way. 
Namely, the vertices of the graph will be the set of sites. Each site s will be con- 
nected by an edge to every other site in Ms- This graph will be called the interac- 
tion graph of the LFP computation. The ensemble of such graphs, parametrized 
by the clause density a, will be denoted by Ife(n, a). 

Note that this interaction graph has many more edges in general than the 
Gaifman graph. In particular, every node that was within the locality rank 
neighborhood of the Gaifman graph is now connected to it by a single edge. 
The resulting graph is, therefore, far more dense than the Gaifman graph. 

What is the size of cliques in this interaction graph? This is not the same as 
the size of cliques in the factor graph, or the Gaifman graph, because the density 
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of the graph is higher. The size of the largest clique is a random variable. What 
we want is an asymptotic almost sure (by this we mean with probability tending 
to 1 in the thermodynamic limit) upper bound on the size of the cliques in the 
distribution of the ensemble Ifc(n, a). 

Note: From here on, all the statements we make about ensembles should be under- 
stood to hold asymptotically almost surely in the respective random ensembles. By that 
we mean that they hold with probability 1 as n ^ oo. 

Lemma 8.2. The size of cliques that appear in graphs of the ensemble lk{n, a) are upper 
bounded by poly(logn) asymptotically almost surely. 

Proof. Let dmax be as in (7.1), and r be the locality rank of </?. The maximum 
degree of a node in the Gaifman graph is asymptotically almost surely upper 
bounded by dmax = 0(log n). The locality rank is a fixed number (roughly equal 
to 2)^ where d is the quantifier depth of the first order formula that is being 
iterated). The node under consideration could have at most dmax others adjacent 
to it, and the same for those, and so on. This gives us a coarse d''^^ upper bound 
on the size of cliques. ■ 

Remark 8.3. While this bound is coarse, there is not much point trying to tighten 
it because any constant power factor (r in the case above) can always be in- 
troduced by computing a r-ary LFP relation. This bound will be sufficient for 
us. 

Remark 8.4. High degree nodes in the Gaifman graph become significant fea- 
tures in the interaction graph since they connect a large number of other nodes 
to each other, and therefore allow the LFP computation to access a lot of infor- 
mation through a neighborhood system of given radius. It is these high degree 
nodes that reduce factorization of the joint distribution since they represent di- 
rect interaction of a large number of variables with each other. Note that al- 
though the radii of neighborhoods are 0(1), the number of nodes in them is 
not 0(1) due to the Poisson distribution of the variable node degrees, and the 
existence of high degree nodes. 
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Remark 8.5. The relation being constructed is monadic, and so it does not intro- 
duce new edges into the Gaifman graph at each stage of the LFP computation. 
AAT:ien we compute a fc-ary LFP, we can encode it into a monadic LFP over a 
polynomially (n'^) larger product space, as is done in the canonical structure, 
for instance, but with the linear order replaced by a weaker successor type rela- 
tion. Therefore, we can always chose to deal with monadic LFP. This is really a 
restatement of the transitivity principle for inductive definitions that says that 
if one can write an inductive definition in terms of other inductively defined 
relations over a structure, then one can write it directly in terms of the original 
relations that existed in the structure [Mos74, p. 16]. 

8.2.3 Generating Distributions 

The standard scenario in finite model theory is to ask a query about a structure 
and obtain a Yes /No answer. For example, given a graph structure, we may ask 
the query "Is the graph connected?" and get an answer. 

But what we want are distributions of solutions that are computed by a pur- 
ported LFP algorithm for /c-SAT. This is not generally the case in finite model 
theory. Intuitively, we want to generate solutions lying in exponentially many 
clusters of the solution space of SAT in the dlRSB phase. How do we do this? 
To generate these distributions, we will start with partial assignments to the set 
of variables in the formula, and ask the question whether such a partial as- 
signm.ent can be extended to a satisfying assignment. We need the following 
definition. 

Definition 8.6. A global relation associated to a decision problem on a class K is 
a relation Rofa fixed arity k over A associated to each structure 21 e K. 

The following is a restatement of the Immerman-Vardi theorem phrased in 
terms of computability of global relations. See [LR03, §11.2, p. 206] for a proof. 

Theorem 8.7. A global relation Ron a class of successor structures is computable in 
polynomial time if and only if R is inductive. 
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We wish to see that the global relation that associates to each structure a 
complete assignment that coincides with the partial assignment placed in the 
relation Rp is inductive. By the theorem above, this is equivalent to showing 
that it is computable in polynomial time. In order to see this, we recall that 
decision problems that are NP-complete have a property called self-reducibility 
that allows us to query a decision procedure for them a polynomial number of 
times and build a solution to the search version of the problem. If P = NP, 
then all decision problems in NP have polynomial time solutions, and one can 
use self-reducibility to see that the search version will also be polynomial time 
solvable — namely, a solution will be constructible in polynomial time. Next 
we will define our search problem in a way that a solution to it will be a global 
relation: an instance of the problem will be a structure with partial assignments, 
and the question will be whether the partial assignment can be extended to a 
complete assignment. The complete assignment can be represented by a global 
unary relation that will store all the literals assigned +1, and which must concur 
with the partial assignment on its overlap. This decision problem is clearly in 
NP, and therefore if P = NP, it would have a polynomial time search solution, 
making R computable in polynomial time. The theorem then says R must be 
inductive. 

Since we want to generate exponentially many such solutions, we will have 
to partially assign 0{n) (a small fraction) of the variables, and ask the LFP to 
extend this assignment, whenever possible, to a satisfying assignment to all 
variables. Thus, we now see what the relation Rp in our vocabulary stands for. 
It holds the partial assignment to the variables. For example, suppose we want 
to ask whether the partial assignment = 1, X2 = 0, = 1 can be extended to 
a satisfying assignment to the SAT formula, we would store this partial assign- 
ment in the tuple (,xi, -1X2, x^) in the relation Rp in our structure. 

As mentioned earlier, the output satisfying assignment will be computed as 
a unary relation which holds all the literals that are assigned the value 1. This 
means that Xi is in the relation if Xi has been assigned the value 1 by the LFP, 
and otherwise -iXj is in the relation meaning that Xi has been assigned the value 
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0 by the LFP computation. This is the simplest case where the FO(LFP) formula 
is simple monadic. For more complex formulas, the output will be some section 
of a relation of higher arity (please see Appendix A for details), and we will 
view it as monadic over a polynomially larger structure. 

Now we "initialize" our structure with different partial assignments and 
ask the LFP to compute complete assignments when they exist. If the partial 
assignment cannot be extended, we simply abort that particular attempt and 
carry on with other partial assignments until we generate enough solutions. By 
"enough" we mean rising exponentially with the underlying problem size. In 
this way we get a distribution of solutions that is exponentially numerous, and 
we now analyze it and compare it to the one that arises in the dlRSB phase of 
random /c-SAT. 

8.3 Disentangling the Interactions: The ENSP Model 

Now that we have a distribution of solutions computed by LFP, we would like 
to examine its conditional independence characteristics. Does it factor through 
any particular graphical model, for instance? 

In Chapter 2, we considered various graphical models and their conditional 
independence characteristics. Once again, our situation is not exactly like any 
of these models. We will have to build our own, based on the principles we 
have learnt. Let us first note two issues. 

The first issue is that graphical models considered in literature are mostly 
static. By this we mean that 

1. they are of fixed size, over a fixed set of variables, and 

2. the relations between the variables encoded in the models are fixed. 

In short, they model fixed interactions between a fixed set of variables. Since 
we wish to apply them to the setting of complexity theory, we are interested in 
families of such models, with a focus on how their structure changes with the 
problem size. 
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The second issue that faces us now is as follows. Even within a certain size n, 
we do not have a fixed graph on n vertices that will model all our interactions. 
The way a LFP computation proceeds through the structure will, in general, 
vary with the initial partial assignment. We would expect a different "trajec- 
tory" of the LFP computation for different clusters in the dlRSB phase. So, if 
one initial partial assignment landed us in cluster X, and another in cluster Y, 
the way the LFP would go about assigning values to the unassigned variables 
would be, in general, different. Even within a cluster, the trajectories of two 
different initial partial assignments will not be the same, although we would 
expect them to be similar. How do we deal with this situation? 

In order to model this dynamic behavior, let us build some intuition first. 

1. We know that there is a "directed-ness" to LFP in that elements that are 
assigned values at a certain stage of the computation then go on to influ- 
ence other elements who are as yet unassigned. Thus, there is a directed 
flow of influence as the LFP computation progresses. This is, for exam- 
ple, different from a Markov random field distribution which has no such 
direction. 

2. There are two t5^es of flows of information in a LFP computation. Con- 
sider simple monadic LFP. In one type of flow, neighborhoods across the 
structure influence the value an unassigned node will take. In the other 
type of flow, once an element is assigned a value, it changes the neighbor- 
hoods (or more precisely the local types of various other elements) in its 
vicinity. Note that while the first type of flow happens during a stage of 
the LFP, the second type is implicit. Namely, there is no separate stage of 
the LFP where it happens. It implicitly happens once any element enters 
the relation being computed. 

3. Because the flow of information is as described above, we will not be able 
to express it using a simple DAG on either the set of vertices, or the set of 
neighborhoods. Thus, we have to consider building a graphical model on 
certain larger product spaces. 
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4. The stage-wise nature of LFP is central to our analysis, and the various 
stages cannot be bundled into one without losing crucial information. 
Thus, we do need a model which captures each stage separately. 

5. In order to exploit the factorization properties of directed graphical models, 
and the resulting parametrization by potentials, we would like to avoid 
any closed directed paths. 

Let us now incorporate this intuition into a model, which we will call a 
Element-Neighborhood-Stage Product Model, or ENS P model for short. This model 
appears to be of independent interest. We now describe the ENSP model for 
a simple monadic least fixed point computation. The model is illustrated in 
Fig. 8.1. It has two types of vertices. 

Element Vertices These vertices, which encode the variables of the A;-SAT in- 
stance, are represented by the smaller circles in Fig. 8.1. They therefore 
correspond to elements in the structure (recall that elements of the struc- 
ture represent the literals in the k-SAT formula). However, each variable in 
our original system Xi, . . . , X„ is represented by a different vertex at each stage 
of the computation. Thus, each variable in the original system gives rise to 
199^1 vertices in the ENSP model. Also recall that there are 2n elements in 
the fc-SAT structure, where n is the number of variables in the SAT for- 
mula. However, in Fig 8.1, we have only shown one vertex per variable, 
and allowed it to be colored two colors - green indicating the variable 
has been assigned a value of -f-l, and red indicating the variable has been 
assigned the value —1. Since the underlying formula (p that is being iter- 
ated is positive, elements do not change their color once they have been 
assigned. 

Neighborhood Vertices These vertices, denoted by the larger circles with blue 
shading in Fig. 8.1, represent the r-neighborhoods of the elements in the 
structure. Just like variables, each neighborhood is also represented by a 
different vertex at each stage of the LFP computation. Each of their pos- 
sible values are the possible isomorphism types of the r-neighborhoods. 
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namely, the local r-types of the corresponding element. These vertices 
may be thought of as vectors of size poly (log n) corresponding to the cliques 
that occur in the neighborhood system we described in Sec. 8.2.2, or one 
may think of them as a single variable taking the value of the various local 
r-types. 

Now we describe the stages of the ENSP. There are 2|y?'^'| stages, starting 
from the leftmost and terminating at the rightmost. Each stage of the LFP com- 
putation is represented by two stages in the ENSP. Initially, at the start of the 
LFP computation, we are in the left-most stage. Here, notice that some variable 
vertices are colored green, and some red. In the figure, X4 1 is green, and Xj 1 is 
red. This indicates that the initial partial assignment that we provided the LFP 
had variable X4 assigned +1 and variable Xj assigned —1. In this way, a small 
fraction 0{n) of the variables are assigned values. The LFP is asked to extend 
this partial assignment to a complete satisfying assignment on all variables (if it 
exists, and abort if not). 

Let us now look at the transition to the second stage of the ENSP. At this 
stage, based on the conditions expressed by the formula (p in terms of their 
own local neighborhoods, and the existence of a bounded number of other local 
neighborhoods in the structure, some elements enter the relation. This means 
they get assigned +1 or —1. In the figure, the variable X3 2 takes the color green 
based on information gathered from its own neighborhood N{X3^i) and two 
other neighborhoods N{X2,i) and N{Xn^i^i). This indicates that at the first 
stage, the LFP assigned the value +1 to the variable X^. Similarly, it assigns 
the value —1 to variable Xn (remember that the first two stages in the ENSP 
correspond to the first stage of the LFP computation). The vertices that do not 
change state simply transmit their existing state to the corresponding vertices 
in the next stage by a horizontal arrow, which we do not show in the figure in 
order to avoid clutter. 

Once some variables have been assigned values in the first stage, their neigh- 
borhoods, and the neighborhoods in their vicinity (meaning, the neighborhoods 
of other elements that are in their vicinity) change. This is indicated by the dot- 
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ted arrows between the second and third stages of the ENSP. Note that this 
happens implicitly during LFP computation. That is why we have represented 
each stage of the actual LFP computation by two stages in the ENSP. The first 
stage is the explicit stage, where variables get assigned values. The second stage 
is the implicit stage, where variables "update their neighborhoods" and those 
neighborhoods in their vicinity. For example, once X3 has been assigned the 
value +1, it updates its neighborhood and also the neighborhood of variable 
X2 which lies in its vicinity (in this example). In this way, influence propagates 
through the structure during a LFP computation. There are two stages of the 
ENSP for each stage of the LFP. Thus, there are 2|</?^| stages of the ENSP in all. 

By the end of the computation, all variables have been assigned values, and 
we have a satisfying assignment. The variables at the last stage are just 

the original Xi. Thus, we recover our original variables {Xi, . . . , Xn) by simply 
looking only at the last (rightmost in the figure) level of the ENSP. 

By introducing extra variables to represent each stage of each variable and 
each neighborhood in the SAT formula, we have accomplished our original 
aim. We have embedded our original set of variates into a polynomially larger 
product space, and obtained a directed graphical model on this larger space. 
This product space has a nice factorization due to the directed graph structure. 
This is what we will exploit. 

Remark 8.8. The explicit stages of the ENSP also perform the task of propagating 
the local constraints placed by the various factors in the underlying factor graph 
outward into the larger graphical model. For example, in our case of the factors 
encoding clauses of a k-SAT formula, the local constraint placed by a clause 
is that the global assignment must evade exactly one restriction to a specified 
set of k coordinates. For example, in the case of A; = 3 the clause xi V ^2 V 
-1X3 permits all global assignments except those whose first three coordinates 
are (—1, — 1,+1). In contrast, if the factor were a XORSAT clause, the local 
restrictions are all in the form of linear spaces, and so the global solution is an 
intersection of such spaces. /c-S AT asks a question about whether certain spaces 
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of the form 

have non-empty intersections, where 1 < ii < i2 < ■ ■ ■ < ik < n and the 
prohibited Vi are ±1. Note that these are local constraints per factor. In 
contrast, XORS AT asks the question about whether certain linear spaces have a 
non-empty intersection. Linearity is a global constraint. Of course, all messages 
are coded into the formula (p. Thus, the end result of multiple runs of the LFP 
will be a space of solutions conditioned upon the requirements. So, for instance, 
if we were to try to solve XORSAT formulae, we would obtain a space that 
would be linear. 

Thus, we have a directed graph with 2n + n = 3n vertices at each stage, 
and 2\ip^\ stages. Since the LFP completes its computation in under a fixed 
polynomial number of steps, this means that we have managed to represent 
the LFP computation on a structure as a directed model using a polynomial 
overhead in the number of parameters of our representation space. In other 
words, by embedding the covariates into a polynomially larger space, we have 
been able to put a common structure on various computations done by LFP 
on them. Note that without embedding the covariates into a larger space, we 
would not be able to place the various computations done by LFP into a single 
graphical model. The insight that we can afford to incur a polynomial cost in 
order to obtain a common graphical model on a larger product space was key 
to this section. 

8.4 Parametrization of the ENSP 

Our goal is to demonstrate the following. 

jff LFP were able to compute solutions to the dlRSB phase of random A;-S AT, 
then the distribution of the entire space of solutions would have a substan- 
tially simpler parametrization than we know it does. 
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In order to accomplish this, we need to measure the growth in the dimension of 
independent parameters it requires to parametrize the distribution of solutions 
that we have just computed using LFP. 

In order to do this, we have embedded our variates into a polynomially 
larger space that has factorization according to a directed model — the ENS P. 
We have seen that the cliques in the ENSP are of size poly (log n). By employ- 
ing the version of Hammersley-Clifford for directed models. Theorem 2.13, we 
also know that we can parameterize the distribution by specifying a system of 
potentials over its cliques, automatically ensuring conditional independence. 

The directed nature of the ENSP also means that we can factor the resulting 
distribution into conditional probability distributions (CPDs) at each vertex of 
the model of the form P{x \ pa(x)), and then normalize each CPD. Once again, 
each CPD will have scope only poly(logn). From our perspective, the major 
benefit of directed graphical models is that we can do this always, without any 
added positivity constraints. Recall that positivity is required in order to apply 
the Hammersley-Clifford theorem to obtain factorizations for undirected mod- 
els. 

How do we compute the CPDs or potentials? We assign various initial par- 
tial assignments to the variables as described in Sec. 8.2.3 and let the LFP com- 
putations run. We only consider successful computations, namely those where 
the LFP was able to extend the partial assignment to a full satisfying assignment 
to the underlying A;-SAT formula. We represent each stage of the LFP compu- 
tation on the corresponding two stages of the ENSP and thus obtain one full 
instantiation of the representation space. We do this exponentially numerous 
times, and build up our local CPDs by simply recording local statistics over all 
these runs. This gives us the factorization (over the expanded representation 
space) of our distribution, assuming that P = NP. 

The ENSP for different runs of the LFP will, in general, be different. This 
is because the flow of influences through the stages of the ENSP will, in gen- 
eral, depend on the initial partial assignment. What is important is that each 
such model will have some properties — such as largest clique size, which de- 
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termines the order of the number of parameters — in common. Let us inspect 
these properties that determine the parametrization of the ENSP model. 

1. There are polynomially many more vertices in the ENSP model than ele- 
ments in the underlying structure. 

2. Lemma 8.2 gives us a poly(logn) upper bound on the size of the neigh- 
borhoods. The number of local r-types whose value each neighborhood 
vertex can take is 2P°^y(^°en) 

3. By Theorem 5.9 there is a fixed constant s such that there must exist s 
neighborhoods in the structure satisfying certain local conditions for the 
formula to hold. Remember, we are presently analyzing a single stage of 
the LFP. This again gives us poly(n) (©(n") in this case) different possibil- 
ities for each explicit stage of the ENSP. The same can also be arrived at 
by utilizing the normal form of Theorem 5.10. By the previous point, each 
of these possibilities can be parameterized by 2P°'y(^°s'^) parameters, giving 
us a total of 2P°'y(^°s") parameters required. 

4. At each implicit stage of the ENSP, we have to update the types of the 
neighborhoods that were affected by the induction of elements at the pre- 
vious explicit stage. There are only n neighborhoods, and each has poly (log n) 
elements at most. 

The ENSP is an interaction model where direct interactions are of size poly (log n), 
and are chained together through conditional independencies. 

Proposition 8.9. A distribution that factorizes according to the ENSP can be pa- 
rameterized with 2P°^y('°s™) independent parameters. The scope of the factors in the 
parametrization grows as poly (log n). 

This also underscores the principle that the description of the parameter 
space is simpler because it only involves interactions between / variates at a 
time directly, and then chains these together through conditional independen- 
cies. In the case of the LFP neighborhood system, the size of the largest cliques 
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are poly(logn) for each single run of the LFP. This will not change if we were 
computing using complex fixed points since the space of /c-types is only poly- 
nomially larger than the underlying structure. 

The crucial property of the distribution of the ENS P is that it admits a recur- 
sive factorization. This is what drastically reduces the parameter space required 
to specify the distribution. It also allows us to parametrize the ENS P by simply 
specifying potentials on its maximal cliques, which are of size poly (log n). 

While the entire distribution obtained by LFP may not factor according to 
any one ENSP, it is a mixture of distributions each of whom factorizes as per 
some ENSP. Next, we analyze the features of such a mixture when exponen- 
tially many instantiations of it are provided. As the reader may intuit, when 
such a mixture is asked to provide exponentially many samples, these will show 
features of scope poly (log n). This is simply a statement about the paucity of in- 
dependent parameters in the component distributions in the mixture. 

8.5 Separation 

We continue our treatment of range limited poly(logn)-parametrizations. We 
will treat the value limited case shortly. The property of the ENSP for range 
limited models that allows us to analyze the behavior of mixtures is that it is 
specified by local Gibbs potentials on its cliques. In other words, a variable 
interacts with the rest of the model only through the cliques that it is part of. 
These cliques are parametrized by potentials. We may think of the cliques as 
the building blocks of each ENSP. The cliques are also upper bounded in size 
by poly (log n). Furthermore, a vertex may be in at most O(logn) such cliques. 
Therefore, a vertex displays collective behavior only of range poly (log n). Thus, 
the mixture comprises distributions that can be parametrized by a subspace of 
]^poiy(iogn)^ in contrast to requiring the larger space R*-'^"^ This means that when 
exponentially many solutions are generated, the features in the mixture will be 
of size poly (log n), not of size 0{n). 

Next, let us examine the value limited case. In this case, the differences are 
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as follows 

1. The solutions are generated by complex LFP, as sections of inductive rela- 
tions of higher arity. 

2. There are O (n) interactions at each stage, but the graphical model is parametriz- 
able with only 2P°'y(^°g") parameters. 

3. Since the interactions are 0{n), the Gibbs potentials are specified over 
cliques of size 0{n). 

4. However, the potentials are parameterized with only 2P°^y('°sn) parameters 
inspite of having 0{n) size. If we think of a potential as a CPD, then the 
CPDs are wide (have 0{n) columns), but are not very long (have only 

2Poly(logn) rows). 

How do we analyze mixtures of such potentials? The idea is as follows. The 
potentials are already large in their scope (0(n)). So we will create a single po- 
tential over the entire graphical model which will have scope poly(n) (since the 
computation terminates in polynomial time). How do we merge various 0{n) 
potentials into a single poly(n) sized potential? And what will be the resulting 
parametrization of this merged potential? 

In order to merge the potentials, we observe that they have a certain sheaf- 
like property. Namely, since they are CPDs of the same LFP, they must agree 
on overlaps. This means that two CPDs cannot specify different behavior for 
the same priors. Remember, these CPDs are nothing but the rules by which the 
computation proceeds, and these rules are the same for different computations 
since it is the same LFP that is being used. Thus, the final merged potential will 
be compatible with each smaller potential on overlaps. 

Using this property, we can see that if each of the potentials had poly (log n) 
parametrization, then so must the final merged potential. Once again we see 
that we cannot instantiate exponentially many solutions from such a limited 
parametrization and obtain the dlRSB picture which requires ample 0{n) joint 
distributions. 
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This explains why polynomial time algorithms fail when interactions be- 
tween variable are ample-O(n), without the possibility of factoring into smaller 
pieces through conditional independencies. This also puts on rigorous ground 
the empirical observation that even NP-complete problems are easy in large 
regimes, and become hard only when the densities of constraints increase above 
a certain threshold. This threshold is precisely the value where ample irreducible- 
0{n) interactions first appear in almost all randomly constructed instances. 

In case of random k-SAT in the dlRSB phase, these ample irreducible-O(n) 
interactions manifest through the appearance of cores which comprise clauses 
whose variables are coupled so tightly that one has to assign them "simultane- 
ously." Cores arise when a set of C = 0{n) clauses have all their variables also 
lying in a set of size C. Once clause density is sufficiently high, cores cannot 
be assigned poly (log n) at a time, and successive such assignments chained to- 
gether through conditional independencies. Nor are they value limited since 
they instantiate in each of the exponentially many clusters in the dlRSB phase. 
Since cores do not factor through conditional independencies, and are not value 
limited either, this makes it impossible for polynomial time algorithms to assign 
their variables correctly. Intuitively, variables in a core are so tightly coupled to- 
gether that they can only vary jointly, without any conditional independencies 
between subsets. Furthermore, their variation is ample. In other words, they 
represent irreducible interactions of size 0(n) which may not be factored any 
further, and which display the ample joint behavior of a system of n covari- 
ates, which requires 0(c") independent parameters to specify. In such cases, 
parametrization over cliques of size only poly(log n) is insufficient to specify the 
joint distribution. Likewise, parametrization over cliques of size 0{n), but with 
only 2P°'y(^°sn) parameters, is insufficient. 

We have shown that in the ENSP for range limited models, the size of the 
largest such irreducible interactions are poly (log n), not 0{n). Furthermore, 
since the model is directed, it guarantees us conditional independencies at the 
level of its largest interactions. More precisely, it guarantees us that there will 
exist conditional independencies in sets of size larger than the largest cliques in 
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its moral graph, which are O (poly (log n)). In other words, there will be inde- 
pendent variation within cores when conditioned upon values of intermediate 
variables that also lie within the core, should the core factorize as per the ENS P. 
This is illustrated in Fig. 8.2. This is contradictory to the known behaviour of 
cores for sufficiently high values of k and clause density in the dlRSB phase. In 
other words, while the space of solutions generated by LFP has features of size 
poly (log n), the features present in cores in the dlRSB phase have size 0{n). 

The framework we have constructed allows us to analyze the set of poly- 
nomial time algorithms simultaneously, since they can all be captured by 
some LFP, instead of dealing with each individual algorithm separately. It 
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Figure 8.2: The factorization and conditional independencies within a core due 
to potentials of size poly (log n). 

At this point, we are ready to state our main theorem. 
Theorem 8.10. P ^ NP. 

Proof Consider the solution space of A;-SAT in the dlRSB phase for A; > 8 as 
recalled in Section. 6.2.1. We know that for high enough values of the clause 
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density a, we have 0{n) frozen variables in almost all of the exponentially 
many clusters. The first observation we make is that since the variables in cores 
are instantiated in exponentially many clusters, we can preclude value limited 
poly(log n)-parametrization. Let us consider then the situation where these clus- 
ters were generated by a purported range limited LFP algorithm for /c-SAT that 
can be parametrized by the ENSP model with clique sizes poly (log n). When 
exponentially many solutions have been generated from distributions having 
the parametrization of the ENSP model, we will see the effect of conditional 
independencies beyond range poly (log n). Let a^^ be a representation of the 
variables in cliques a, ^ and 7, then given a value of we will see independent 
variation over all their possible conditional values in the variables of a and 
7. If each set of such variables has scope at most poly (log n), then this means 
that we have generated more than exponential in poly (log n) distinct solutions, 
we will have non-trivial conditional distributions conditioned upon values of ^ 
variables. At this point, the conditional independencies ensure that we will see 
cross terms of the form 

OilP^l a2/3^2 OliP^2 0C2^^l. 

Note that since 0{n) variables have to be changed when jumping from one clus- 
ter to another, we may even choose our poly(logn) blocks to be in overlaps of 
these variables. This would mean that with a poly (log n) change in frozen vari- 
ables of one cluster, we would get a solution in another cluster. But we know 
that in the highly constrained phases of dlRSB, we need 0{n) variable flips to 
get from one cluster to the next. This gives us the contradiction that we seek. ■ 

The basic question in analyzing such mixtures is: How many variables do 
we need to condition upon in order to split the distribution into conditionally 

independent pieces? The answer is given by (a) the size of the largest cliques 
and (b) the number of such cliques that a single variable can occur in. In our 
case, these two give us a poly(logn) quantity. When exponentially many solu- 
tions have been generated, there will be conditional distributions that exhibit 
conditional independence between blocks of variates size poly (log n). Namely, 
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there will be no effect of the values of one upon those of the other. This is what 
prevents the Hamming distance between solutions from being 0{n). This is 
shown pictorially in Fig. 8.2. 

We may think of such mixtures as possessing only 0^°^^*^'°^"^ "channels" to 
communicate directly with other variables. All long range correlations trans- 
mitted in such a distribution must pass through only these many charmels. 
Therefore, exponentially many solutions cannot independently transmit 0{n) 
correlations (namely, the variables that have to be changed when jumping from 
one cluster to another). Their correlations must factor through this bottleneck, 
which gives us conditional independences after range poly (log n). This means 
that blocks of size larger than this are now varying independently of each other 
conditioned upon some intermediate variables. This gives us the cross-terms 
described earlier, and prevents the Hamming distance from being 0{n) on the 
average over exponentially many solutions. Instead, it must be poly (log n). 

We can see that due to the limited parameter space that determines each 
variable, it can only display a limited joint behavior. This behavior is completely 
determined by poly (log n) other variates, not by 0{n) other variates. Thus, the 
"jointness" in this distribution lies at a level poly (log n). This is why when 
enough solutions have been generated by the LFP, the resulting distribution 
will start showing features that are at most of size poly (log n). In other words, 
there will be solutions that show cross-terms between features whose size is 
poly (log n). 

It is also useful to consider how many different parametrizations a block of 
size poly(logn) may have. Each variable may choose poly(logn) partners out of 
0{n) to form a potential. It may choose O(logn) such potentials. Even coarsely, 
this means blocks of variables of size poly(logn) only "see" the rest of the dis- 
tribution through equivalence classes that grow as 0(72^°^^*^^°^"^)). This quantity 
would have to grow exponentially with n in order to display the behavior of 
the dlRSB phase. Once again we return to the same point — that the jointness 
of the distribution that a purported LFP algorithm would generate would lie 
at the poly (log n) levels of conditional independence, whereas the jointness in 
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the distribution of the dlRSB solution space is truly 0{n). Namely, there are 
irreducible interactions of size 0(n) that cannot be expressed as interactions be- 
tween poly(logn) variates at a time, and chained together by conditional inde- 
pendencies as would be done by a LFP. This is central to the separation of com- 
plexity classes. Hard regimes of NP-complete problems allow 0{n) variates to 
irreducibly jointly vary, and accounting for such 0(n) jointness that cannot be 
factored any further is beyond the capability of polynomial time algorithms. 
We collect some observations in the following. 

Remark 8.11. The poly(logn) size of features and therefore Hamming distance 
between solutions tells us that polynomial time algorithms correspond to the 
RS phase of the IRSB picture, not to the dlRSB phase. 

Remark 8.12. We can see from the preceding discussion that the number of in- 
dependent parameters required to specify the distribution of the entire solution 
space in the dlRSB phase (for k > 8) rises as c", c > 1. This is because it takes 
that many parameters to specify the exponentially many 0{n) variable "jumps" 
between the clusters. These jumps are independent, and cannot be factored 
through poly(logn) sized factors since that would mean conditional indepen- 
dence of pieces of size poly(log n) and would ensure that the Hamming distance 
between solutions was of that order. 

Remark 8.13. Note that the central notion is that of the number of independent 
parameters, not frozen variables. For example, frozen variables would occur 
even in low dimensional parametrizations in the presence of additional con- 
straints placed by the problem. This is what happens in XORSAT, where the 
linearity of the problem causes frozen variables to occur. The frozen variables 
in XORSAT do not arise due to a high dimensional parametrization, but sim- 
ply because the 2-core percolates [MM09, §18.3]. Each cluster is a linear space 
tagged on to a solution for the 2-core, which is also why the clusters are all of 
the same size. Linear spaces always admit a simple description as the linear 
span of a basis, which takes the order of log of the size of the space. 

Remark 8.14. It is tempting to think that there will be such a parametrization 
whenever the algorithmic procedure used to generate the solutions is stage- 
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wise local. This is not so. We need the added requirement that "mistakes" are 
not allowed. Namely, we cannot change a decision that has been made. Other- 
wise, even PFP has the stage-wise bounded local property, but it can give rise to 
distributions without any conditional independence factorizations whose fac- 
tors are of size poly (log n). When placed in the ENSP, we see that there is fac- 
torization, but over an exponentially larger space, where clique sizes are of ex- 
ponential size. One might observe that the requirement that we not make any 
trial and error at all that limits LFP computations in a fundamentally different 
manner than the locality of information flows. See [Put65] for an interesting 
related notion of "trial and error predicates" in computability theory. 

8.6 Some Perspectives 

The following perspectives are reinforced by this work. 

1. The most natural object of study for constraint satisfaction problems is the 
entire space of solutions. It is this space where the dependencies and inde- 
pendencies that the CSP imposes upon covariates that satisfy it manifest. 

2. There is an intimate relation between the geometry of the space and its 
parametrization. Studying the parametrization of the space of solutions is 
a worthwhile pursuit. 

3. The view that an algorithm is a means to generate one solution is limited 
in the sense that it is oblivious to the geometry of the space of all solutions. 
It may, of course, be the appropriate approach in many applications. But 
there are applications where requiring algorithms to generate numerous 
solutions and approximate with increasing accuracy the entire space of 
solutions seems more natural. 

4. Conditional independence over factors of small scope is at the heart of re- 
solving CSPs by means of polynomial time algorithms. In other words, 
polynomial time algorithms succeed by successively "breaking up" the 
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problem into smaller subproblems that are joined to each other through 
conditional independence. Consequently, polynomial time algorithms can- 
not solve problems in regimes where blocks whose order is the same as the 
underlying problem instance require simultaneous resolution. 

5. Polynomial time algorithms resolve the variables in CSPs in a certain or- 
der, and with a certain structure. This structure is important in their study. 
In order to bring this structure under study, we may have to embed the 
space of covariates into a larger space (as done by the ENSP). 
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A. Reduction to a Single LFP 
Operation 



A.1 The Transitivity Theorem for LFP 

We now gather a few results that will enable us to cast any LFP into one having 
just one application of the LFP operator. Since we use this construction to deal 
with complex fixed points, we reproduce it in this appendix. The presentation 
here closely follows [EF06, Ch. 8]. 

The first result, known as the transitivity theorem, tells us that nested fixed 
points can always be replaced by simultaneous fixed points. Let ip{x, X, Y) and 
ijj{y, X, Y) be first order formulas positive in X and Y. Moreover, assume that 
no individual variable free in LFPy y^/'(y, X, Y) gets into the scope of a corre- 
sponding quantifier or LFP operator in A.l. 

[LFP.,x¥'(x, X, [LfPy,YHy, X, y)])]t (A.l) 

Then A.l is equivalent to a formula of the form 

3(V)«[LFP,,2,x(z,^)]u, 

where x is first order. 
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A.2 Sections and the Simultaneous Induction Lemma 
for LFP 

Next we deal with simultaneous jfixed points. Recall that simultaneous induc- 
tions do not increase the expressive power of LFP. The proof utilizes a coding 
procedure whereby each simultaneous induction is embedded as a section in a 
single LFP operation of higher arity. First, we introduce the notion of a section. 

Definition A.l. Let i? be a relation of arity {k + /) on ^4 and a G A^. Then the 
a-section of R, denoted by Ra, is given by 

i?a := {b e A''\R{ha.)} 

Next we see how sections can be used to encode multiple simultaneous op- 
erators producing relations of lower arity into a single operator producing a 
relation of higher arity. Let m operators Fi, . . . , act as follows: 



F2: {A''') X ••• X {A''"^) {A''^) 
Fm - {A^') X ••• X {A^"^) {A''"^) 



(A.2) 



We wish to embed these operators as sections of a "larger" operator, which 
is known as their simultaneous join. 

We will denote a tuple consisting only of a's by a. The length of d be clear 
from context. 

Definition A.2. Let Fi, . . . , be operators acting as above. Set 

k :— max{/ci, . . . , k^} + m + 1. 

The simultaneous join of Fi, . . . , F^, denoted by J(Fi, . . . , F^), is an operator 
acting as 

J(Fi,...,F^): {A')^{A') 
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such that for any a,b E A, the a6*-section (where the length of a here is k — 
i + 1) of the power of J is the power of the operator F^. Concretely, the 
simultaneous join is given by 

J{R) := U • • • , Rabr^) X {hb'}) U • • • 

• • • U {F^{Rab^,. . . , Rabr^) X {56"})). (A.3) 

The simultaneous join operator defined above has properties we will need 
to use. These are collected below. 

Lemma A.3. The i*^ power J* of the simultaneous join operator satisfies 

J'- U {{Fix{ab'})U---U{F^x{ab^})). (A.4) 

a,bEA,a^b 

The following corollaries are now innmediate. 

Corollary A.4. The fixed point J°° of the simultaneous join of operators {Fi, ... , F^) 
exists if and only if their simultaneous fixed point {F^, ... , F^) exists. 

Corollary A.5. The simultaneous join of inductive operators is inductive. 

Finally, we need to show that the simultaneous join can itself be expressed 
as a LFP computation. We need formulas that will help us define sections of a 
simultaneous induction. Since the sections are coded using tuples of the form 
^k-i+ki+i^i^ we will need formulas that can express this. 

Definition A.6. For ^>l and i = 1, . . . , £, the section formulas 5\{xi, . . . ,xi,v,w) 

= w) l\ {x\ = ■ ■ ■ = xi = v) i = 1 

Sl{xi, ...,xi,v, w) <{ ^(^; = w) A (xi = • • • = xe-i+i ^ v) A (A.5) 

{xe-i+2 = ••• = '«;) i > 1. 

For distinct a, 6 G 21, 21 |= 5i[ab^ab] if and only if i = j. 

Now we are ready to show that simultaneous fixed-point inductions of for- 
mulas can be replaced by the fixed point induction of a single formula. 
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Definition A.7. Let 

(pi{Ri, . . . , R^, Xi), . . . , ipm{Ri, . . . , R^, x^) 

be formulas of LFP. As always, we let Ri be a ki-ary relation and Xj be a /cj-tuple. 
Furthermore, lei (pi, ... ,ipm be positive in . . . , R^- Set k :— max{A;i, . . . , km}+ 
m + 1. Define a new first order formula xj having k variables and computing a 
single A;-ary relation Z by 

Xj{Z, Zl,..., Zk) := 3v3w{^v = wA 

{{^l{Zvwh Z;Ow"', Zl,..., Zk) A 5i{zi, Zk,v,w)) 

V {^P2{Zyyjl, Zyy,rr., Zl,..., Z k) A ^3 (^^l, ■■■,Zk,V, W)) 

V {(pmiZ^wh ■ ■ ■ , Zvwm, Zl, . . . , Zk) A 5^{zi, . . . , Zk,V,w)))) (A.6) 

Then, the relation computed by the least fixed point of xj contains all the 
individual least fixed points computed by the simultaneous induction as its sec- 
tions. 



108 



Bibliography 



[ACO08] D. Achlioptas and A. Coja-Oghlan. Algorithmic barriers from 
phase transitions. arXiv:0803.2122v2 [math.CO], 2008. 

[AMOO] Srinivas M. Aji and Robert J. McEliece. The generalized distribu- 
tive law. IEEE Trans. Inform. Theory, 46(2):325-343, 2000. 

[AP04] Dimitris Achlioptas and Yuval Peres. The threshold for random k- 
SAT is 2*= log 2 - 0{k). J. Amer. Math. Soc, 17(4):947-973 (electronic), 
2004. 

[ART06] Dimitris Achlioptas and Federico Ricci-Tersenghi. On the solution- 
space geometry of random constraint satisfaction problems. In 
STOC'06: Proceedings of the 38th Annual ACM Symposium on The- 
ory of Computing, pages 130-139. ACM, New York, 2006. 

[AV91] Serge Abiteboul and Victor Vianu. Datalog extensions for database 
queries and updates. /. Comput. Syst. Sci., 43(1):62-124, 1991. 

[AV95] Serge Abiteboul and Victor Vianu. Computing with first-order 
logic. Journal of Computer and System Sciences, 50:309-335, 1995. 

[BDG95] Jose Luis Balcazar, Josep Diaz, and Joaquim Gabarro. Structural 
complexity. I. Texts in Theoretical Computer Science. An EATCS 
Series. Springer- Verlag, Berlin, second edition, 1995. 

[Bes74] Julian Besag. Spatial interaction and the statistical analysis of lat- 
tice systems. /. Roy. Statist. Soc. Ser. B, 36:192-236, 1974. With dis- 



109 



BIBLIOGRAPHY 



110 



cussion by D. R. Cox, A. G. Hawkes, R Clifford, R Whittle, K. Ord, 
R. Mead, J. M. Hammersley, and M. S. Bartlett and with a reply by 
the author. 

[BGS75] Theodore Baker, John Gill, and Robert Solovay. Relativizations of 
the V ^INV question. SMM /. Comput., 4(4):431-442, 1975. 

[Bis06] Christopher M. Bishop. Pattern recognition and machine learning. In- 
formation Science and Statistics. Springer, New York, 2006. 

[BMWOO] G Biroli, R Monasson, and M Weigt. A variational description 
of the ground state structure in random satisfiability problems. 

PHYSICAL JOURNAL B, 568:551-568, 2000. 

[CF86] Ming-Te Chao and John V. Franco. Probabilistic analysis of 
two heuristics for the 3-satisfiability problem. SMM /. Comput., 
15(4):1106-1118, 1986. 

[CKT91] Peter Cheeseman, Bob Kanefsky, and William M. Taylor. Where 
the really hard problems are. In IJCAI, pages 331-340, 1991. 

[CO09] A. Coja-Oghlan. A better algorithm for random k-sat. 
arXiv:0902.3583vl [math.CO], 2009. 

[Coo71] Stephen A. Cook. The complexity of theorem-proving procedures. 

In SrOC '71: Proceedings of the third annual ACM symposium on The- 
ory of computing, pages 151-158, New York, NY, USA, 1971. ACM 
Press. 

[Coo06] Stephen Cook. The P versus NP problem. In The millennium prize 
problems, pages 87-104. Clay Math. Inst., Cambridge, MA, 2006. 

[Daw79] A. P. Dawid. Conditional independence in statistical theory. /. Roy. 
Statist. Soc. Ser. B, 41(1):1-31, 1979. 

[Daw80] A. Philip Dawid. Conditional independence for statistical opera- 
tions. Ann. Statist., 8(3):598-617, 1980. 



110 



BIBLIOGRAPHY 



111 



[DeolO] Vinay Deolalikar. A distribution centric approach to constraint sat- 
isfaction problems. Under preparation, 2010. 

[DLW95] Anuj Dawar, Steven Lindell, and Scott Weinstein. Infinitary logic 
and inductive dejfinability over finite structures. Inform, and Corn- 
put., 119(2):160-175, 1995. 

[DMMZ08] Herve Daude, Marc Mezard, Thierry Mora, and Riccardo 
Zecchina. Pairs of sat-assignments in random boolean formulae. 
Theor. Comput. Sci, 393(l-3):260-279, 2008. 

[Dob68] R. L. Dobrushin. The description of a random field by means of 
conditional probabilities and conditions on its regularity. Theory 
Prob. Appl, 13:197-224, 1968. 

[Edm65] Jack Edmonds. Minimum partition of a matroid into independents 
subsets. Journal of Research of the National Bureau of Standards, 69:67- 
72, 1965. 

Heinz-Dieter Ebbinghaus and Jorg Flum. Finite model theory. 
Springer Monographs in Mathematics. Springer- Verlag, Berlin, en- 
larged edition, 2006. 

Ronald Fagin. Generalized first-order spectra and polynomial- 
time recognizable sets. In Complexity of computation (Proc. SIAM- 
AMS Sympos. Appl. Math., New York, 1973), pages 43-73. SIAM- 
AMS Proc, Vol. VII. Amer. Math. Soc, Providence, R.I., 1974. 

E. Friedgut. Necessary and sufficient conditions for sharp thresh- 
olds and the A;-sat problem. /. Amer. Math. Soc, 12(20):1017-1054, 
1999. 

Ronald Fagin, Larry J. Stockmeyer, and Moshe Y. Vardi. On 
monadic np vs. monadic co-np. Inf. Comput., 120(l):78-92, 1995. 



[EF06] 



[Fag74] 



[Fri99] 



[FSV95] 



111 



BIBLIOGRAPHY 



112 



[Gai82] Haim Gaifman. On local and nonlocal properties. In Proceedings of 
the Herbrand symposium (Marseilles, 1981), volume 107 of Stud. Logic 
Found. Math., pages 105-135, Amsterdam, 1982. North-Holland. 

[GG84] Stuart Geman and Donald Geman. Stochastic relaxation, gibbs 
distributions and the bayesian restoration of images. IEEE Trans- 
actions on Pattern Analysis and Machine Intelligence, 6(6):721-741, 
November 1984. 

[GJ79] Michael R. Garey and David S. Johnson. Computers and intractabil- 
ity. W. H. Freeman and Co., San Francisco, Calif., 1979. A guide 
to the theory of NP-completeness, A Series of Books in the Mathe- 
matical Sciences. 

[GSOO] Martin Grohe and Thomas Schwentick. Locality of order-invariant 
first-order formulas. ACM Trans. Comput. Log., 1(1):112-130, 2000. 

[Han65] William Hanf . Model-theoretic methods in the study of elementary 
logic. In Theory of Models (Proc. 1963 Intemat. Sympos. Berkeley), 
pages 132-145. North-Holland, Amsterdam, 1965. 

[HC71] J. M. Hammersley and R Clifford. Markov fields on finite graphs 
and lattices. 1971. 

[HH76] J. Hartmanis and J. E. Hopcroft. Independence results in computer 
science. SIGACTNews, 8(4): 13-24, 1976. 

[Hod93] Wilfrid Hodges. Model theory, volume 42 of Encyclopedia of Math- 
ematics and its Applications. Cambridge University Press, Cam- 
bridge, 1993. 

[Imin82] Neil Immerman. Relational queries computable in polynomial 
time (extended abstract). In STOC '82: Proceedings of the fourteenth 
annual ACM symposium on Theory of computing, pages 147-152, New 
York, NY, USA, 1982. ACM. 



112 



BIBLIOGRAPHY 



113 



[Imm86] Neil Immerman. Relational queries computable in polynomial 
time. Inform, and Control, 68(l-3):86-104, 1986. 

[Imjn99] Neil Immerman. Descriptive complexity. Graduate Texts in Com- 
puter Science. Springer- Ver lag. New York, 1999. 

[Kar72] R. M. Karp. Reducibility among combinatorial problems. In R. E. 

Miller and J. W. Thatcher, editors. Complexity of Computer Computa- 
tions, pages 85-103. Plenum Press, 1972. 

[KF09] D. KoUer and N. Friedman. Probabilistic Graphical Models: Principles 
and Techniques. MIT Press, 2009. 

[KFaL98] Frank R. Kschischang, Brendan J. Frey, and Hans andrea Loeliger. 

Factor graphs and the sum-product algorithm. IEEE Transactions 
on Information Theory, 47:498-519, 1998. 

[KMRT+06] Florent Krzakala, Andrea Montanari, Federico Ricci-Tersenghi, 
Guilhem Semerjian, and Lenka Zdeborova. Gibbs states and the 
set of solutions of random constraint satisfaction problems. CoRR, 
abs/cond-mat/0612365, 2006. 

[KMRT+07] Florent Krzakala, Andrea Montanari, Federico Ricci-Tersenghi, 
Guilhem Semerjian, and Lenka Zdeborova. Gibbs states and the 
set of solutions of random constraint satisfaction problems. Proc. 
Natl. Acad. Sci. USA, 104(25):10318-10323 (electronic), 2007. 

[KS80] R. Kinderman and J. L. Snell. Markov random fields and their ap- 
plications. American Mathematical Society, 1:1-142, 1980. 

[KS94] Scott Kirkpatrick and Bart Selman. Critical behavior in the satisfi- 
ability of random boolean formulae. Science, 264:1297-1301, 1994. 

[KSC84] Karri Kiiveri, T. P. Speed, and J. B. Carlin. Recursive causal models. 
/. Austral. Math. Soc. Ser A, 36(l):30-52, 1984. 



113 



BIBLIOGRAPHY 



114 



[Lau96] Steffen L. Lauritzen. Graphical models, volume 17 of Oxford Statis- 
tical Science Series. The Clarendon Press Oxford University Press, 
New York, 1996. Oxford Science Publications. 

[LDLL90] S. L. Lauritzen, A. P. Dawid, B. N. Larsen, and H.-G. Leimer. 

Independence properties of directed Markov fields. Networks, 
20(5):491-505, 1990. Special issue on influence diagrams. 

[Lev73] Leonid A. Levin. Universal sequential search problems. Problems 
of Information Transmission, 9(3), 1973. 

[Li09] Stan Z. Li. Markov random field modeling in image analysis. Ad- 

vances in Pattern Recognition. Springer- Verlag London Ltd., Lon- 
don, third edition, 2009. With forewords by Anil K. Jain and Rama 
Chellappa. 

[Lib04] Leonid Libkin. Elements of finite model theory. Texts in Theoretical 
Computer Science. An EATCS Series. Springer- Verlag, Berlin, 2004. 

[Lin05] S. Lindell. Computing monadic fixed points in linear 
time on doubly linked data structures. available online at 
http://citeseerx. ist.psu.edu/doi=10. 1 . 1 .122.144:7, 2005. 

[LR03] Richard Lassaigne and Michel De Rougemont. Logic and Complex- 
ity. Springer- Verlag, London, 2003. 

[MA02] Cristopher Moore and Dimitris Achlioptas. Random k-sat: Two 
moments suffice to cross a sharp threshold. FOCS, pages 779-788, 
2002. 

[MM09] Marc Mezard and Andrea Montanari. Information, physics, and com- 
putation. Oxford Graduate Texts. Oxford University Press, Oxford, 
2009. 



114 



BIBLIOGRAPHY 



115 



[MMW05] Elitza N. Maneva, Elchanan Mossel, and Martin J. Wainwright. A 
new look at survey propagation and its generalizations. In SODA, 
pages 1089-1098, 2005. 

[MMW07] Elitza Maneva, Elchanan Mossel, and Martin J. Wainwright. A 
new look at survey propagation and its generalizations. /. ACM, 
54(4):Art. 17, 41 pp. (electronic), 2007. 

[MMZ05] M. Mezard, T. Mora, and R. Zecchina. Clustering of solutions in 
the random satisjfiability problem. Phys. Rev. Lett, 94(19): 197-205, 
May 2005. 

[Mos74] Yiannis N. Moschovakis. Elementary induction on abstract structures. 

North-Holland Publishing Co., Amsterdam, 1974. Studies in Logic 
and the Foundations of Mathematics, Vol. 77. 

[Mou74] John Moussouris. Gibbs and Markov random systems with con- 
straints. /. Statist. Phys., 10:11-33, 1974. 

[MPV87] Marc Mezard, Giorgio Parisi, and Miguel Angel Virasoro. Spin 
glass theory and beyond, volume 9 of World Scientific Lecture Notes in 
Physics. World Scientific Publishing Co. Inc., Teaneck, NJ, 1987. 

[MPZ02] M Mezard, G Parisi, and R Zecchina. Analytic and Algorithmic 
Satisfiability Problems. Science, 297(August):812-815, 2002. 

[MRTS07] Andrea Montanari, Federico Ricci-Tersenghi, and Guilhem Se- 
merjian. Solving constraint satisfaction problems through belief 
propagation-guided decimation, Sep 2007. 

[MSL92] David Mitchell, Bart Selman, and Hector Levesque. Hard and easy 
distributions of sat problems. In AAAI, pages 459-465, 1992. 

[MZ97] Remi Monasson and Riccardo Zecchina. Statistical mechanics of 
the random /c-satisfiability model. Phys. Rev. E, 56(2):1357-1370, 
Aug 1997. 

115 



BIBLIOGRAPHY 



116 



[MZ02] Marc Mezard and Riccardo Zecchina. Random /^-satisfiability 
problem: From an analytic solution to an efficient algorithm. Phys. 
Rev. E, 66(5):056126, Nov 2002. 

[Put65] Hilary Putnam. Trial and error predicates and the solution to a 
problem of mostowski. /. Symb. Log., 30(l):49-57, 1965. 

[RR97] Alexander A. Razborov and Steven Rudich. Natural proofs. /. 

Comput. System Sci, 55(1, part l):24-35, 1997. 26th Annual ACM 
Symposium on the Theory of Computing (STOC '94) (Montreal, 
PQ, 1994). 

[SB99] Thomas Schwentick and Klaus Barthelmann. Local normal forms 
for first-order logic with applications to games and automata. In 
Discrete Mathematics and Theoretical Computer Science, pages 444- 
454. Springer Verlag, 1999. 

[See96] Detlef Seese. Linear time computable problems and first-order de- 
scriptions. Math. Structures Comput. Sci, 6(6):505-526, 1996. Joint 
COMPUGRAPH/SEMAGRAPH Workshop on Graph Rewriting 
and Computation (Volterra, 1995). 

[Sip92] Michael Sipser. The history and status of the p versus np question. 
In STOC, pages 603-618, 1992. 

[Sip97] M. Sipser. Introduction to the Theory of Computation. PWS Publishing 
Company, 1997. 

[Var82] Moshe Y. Vardi. The complexity of relational query languages (ex- 
tended abstract). In STOC '82: Proceedings of the fourteenth annual 
ACM symposium on Theory of computing, pages 137-146, New York, 
NY, USA, 1982. ACM. 

[Wig07] Avi Wigderson. P, NP, and Mathematics - a computational com- 
plexity perspective. Proceedings of the ICM 2006, 1:665-712, 2007. 



116 



